Quick Start #
SUSE Linux Enterprise Real Time 12 SP5
SUSE Linux Enterprise Real Time is an add-on to SUSE® Linux Enterprise. It allows you to run tasks which require deterministic real-time processing in a SUSE Linux Enterprise environment.
SUSE Linux Enterprise Real Time meets this requirement by offering several options for CPU and I/O scheduling, CPU shielding and for setting CPU affinities to processes.
Copyright © 2006–2024 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see http://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.
1 Product Overview #
If your business can respond more quickly to new information and changing market conditions, you have a distinct advantage over those that cannot. Running your time-sensitive mission-critical applications using SUSE Linux Enterprise Real Time reduces process dispatch latencies and gives you the time advantage you need to increase profits or avoid further financial losses, ahead of your competitors.
1.1 Key Features #
Some of the key features for SUSE Linux Enterprise Real Time are:
Pre-emptible real-time kernel.
Ability to assign high-priority processes.
Greater predictability to complete critical processes on time, every time.
In comparison to normal Linux kernel, which is optimized for overall system performance regardless of individual process response time, SUSE Linux Enterprise Real Time kernel is tuned toward predictable process response time.
Increased reliability.
Lower infrastructure costs.
Tracing and debugging tools that help you analyze and identify bottlenecks in mission-critical applications.
1.2 Specific Scenario #
SUSE Linux Enterprise Real Time Service Pack 3 supports Virtualization and Docker usage. Refer to Article “Virtualization Guide” for reference.
2 Installing SUSE Linux Enterprise Real Time #
To install SUSE Linux Enterprise Real Time 12 SP5, start a regular SUSE Linux Enterprise Server
12 SP5 installation. Select SUSE Linux Enterprise Real Time 12 SP5 as
an add-on product during the installation. Alternately, if SUSE Linux Enterprise Server
is already installed, you can start the Add-on Product
installation from YaST or enable SUSE Linux Enterprise Real Time in the YaST SUSE Customer Center
configuration. However, in the YaST Boot Loader
configurator you need to manually select the -rt kernel
flavor as the default.
SUSE Linux Enterprise Real Time always needs a SUSE Linux Enterprise Server 12 SP5 base, it cannot be installed in stand-alone mode. For information on how to install add-on products, see the SUSE Linux Enterprise 12 SP5 Deployment Guide, available at http://www.suse.com/doc. Refer to chapter Installing Add-On Products.
The following sections provide a brief introduction to the tools and possibilities of SUSE Linux Enterprise Real Time.
3 Managing CPU Sets with cset #
In some circumstances, it is beneficial to be able to run
specific tasks only on defined CPUs. For this reason, the Linux
kernel provides a feature called cpuset.
The cpuset feature provides the means to do a so called “soft
partitioning” of the system. Dedicated CPUs, together
with some predefined memory, work on several tasks.
cset consists of one “super command” called
shield and the “regular commands”
set and proc. The purpose
of the super command shield is to create a
common CPU shielding setup within one step by combining regular
commands.
For more information about options and parameters of the
shield subcommand, view the help by running:
cset help shield3.1 Setting Up a CPU Shield for a Single CPU #
The command cset provides the high level
functionality to set up and manipulate CPU Sets. An example for setting
up a CPU shield is:
cset shield --cpu=3This will shield CPU3. On a machine with 4 cores CPU0-CPU2 are unshielded.
3.2 Setting up CPU Shields for Multiple CPUs #
If you need to shield more than one CPUs, the argument of the
--cpu option accepts comma separated lists of CPUs
including range specifications:
cset shield --cpu=1,3,5-7On a machine with 8 cores, this command will shield CPU1, CPU3, CPU5, CPU6, and CPU7. The CPUs CPU0, CPU2 and CPU4 will remain unshielded.
Already existing CPU shields can be extended by the same command. For example, to add CPU4 to the mentioned CPU set, use this command:
cset shield --cpu=1,3-7CPU1, CPU3, CPU5 to CPU6 were already shielded and only CPU4 will additionally be shielded. Technically, the command is updating the current CPU shield schema. To reduce the number of shielded CPUs and to unshield CPU1, for example, use the following command:
cset shield --cpu=3-7Now only CPU3, CPU4, CPU5, CPU6, and CPU7 are shielded. CPU0, CPU1, and CPU2 are available for system usage.
3.3 Showing CPU Shields #
After the CPU shielding is set up you can display the current
configuration by running cset shield without
additional options:
cset shield
cset: --> shielding system active with
cset: "system" cpuset of: 0-2 cpu, with: 47
cset: "user" cpuset of: 3-7 cpu, with: 0 By default, CPU shielding consists of at least of three
cpusets:
rootexists always and contains all available CPUs.systemis thecpusetof unshielded CPUs.useris thecpusetof shielded CPUs
3.4 Shielding Processes #
Certain processes or groups of processes can be assigned to a
shielded cpuset, after the CPU set is created. To start a new
process in the shielded CPU set use the --exec
option:
cset shield --exec APPLICATION
To move already running processes to the shielded CPU set use
the --shield and --pid options. The
--pid option accepts a comma-separated list
of PIDs and range specifications:
cset shield --shield --pid=1,2,600-700 This moves processes with PID 1, 2, and from 600 to 700 to the
shielded CPU set. If there is a gap in the range from 600 to 700,
then only those available process will be moved to the shield
without warning. cset handles threads like
processes and will also interpret TIDs and assign them to the
required CPU set.
The --shield option does not check the
processes you request to move into the shield. This means that
the command will move any processes that
are bound to specific CPUs—even kernel threads. You can
cause a complete system lockup by indiscriminately specifying
arbitrary PIDs to the --shield command.
3.5 Showing Shielded Processes #
Use the cset shield command to show the
number of currently shielded processes (the same command can be
used to show the current CPU shield setup). To list shielded and
unshielded processes, add the --verbose
option:
cset shield --verbose
cset: --> shielding system active with
cset: "system" cpuset of: 0-2,4-15 cpu, with:
USER PID PPID S TASK NAME
-------- ----- ----- - ---------
root 1 0 S init [3]
[...]
cset: "user" cpuset of: 3 cpu, with: 1
USER PID PPID S TASK NAME
-------- ----- ----- - ---------
root 10202 10170 S application3.6 Unshielding Processes #
To remove a process (or group of processes) from the CPU shield use the
--unshield option. The argument for
--unshield is similar to the --shield
option. This option accepts a comma-separated list of PIDs/TIDs and
range specifications:
cset shield --unshield --pid=2,650-655This command will unshield the process with the PID 2 and the processes in range of 650 and 655.
3.7 Resetting CPU Sets #
To delete CPU sets use the cset option
--reset. This will unshield all CPUs and
migrate dedicated processes to all available CPUs again.
4 Managing Tree-like Structures with cset #
More detailed configuration of cpusets can be done with the
cset commands set and
proc.
The subcommand set is used to create, modify
and destroy cpusets. Compared to the supercommand
shield, the set subcommand can
additionally assign memory nodes for NUMA machines.
Besides assigning memory nodes, the subcommand
set creates cpusets in a
tree-like structure, rooted at the root cpuset.
To create a cpuset with the subcommand set
you need to specify the CPUs which should be used. Either use a
comma-separated list or a range specification:
cset set --cpu=1-7 "/one" This command will create a cpuset called one
with assigned CPUs from CPU1 to CPU7. To specify a new cpuset
called two that is a subset of one,
proceed as follows:
cset set --cpu=6 "/one/two" Cpusets follow certain rules. Children can only include CPUs
that the parents already have. If you try to specify a different
cpuset, the kernel cpuset subsystem will not let you create that
cpuset. For example, if you create a cpuset that contains CPU3,
and then attempt to create a child of that cpuset with a CPU other
than 3, you will get an error, and the cpuset will not be created.
The resulting error is somewhat cryptic and is usually
“Permission denied”.
To show a table containing useful information, like CPU list
and memory list, use the -r parameter. The
“-X” column shows the exclusive state of CPU or
memory. The “path” column shows the real path in the
virtual cpuset file system.
cset set -r On NUMA machines memory nodes can be assigned to a cpuset
similar to CPUs. The --mem option of the
subcommand set allows a comma-separated and
inclusive range specification of memory nodes. This example will
assign MEM1, MEM3, MEM4, MEM5 and MEM6 to the cpuset
new_set:
cset set --mem=1,3-6 new_set
Additionally, with the
--cpu_exclusive and
--mem_exclusive options (without any
additional arguments) set the CPUs or memory nodes exclusive to a
cpuset:
cset set --cpu_exclusive "/one" The status of exclusive state of CPU or memory is shown in
the -X column when running:
cset set -r For more detailed information about options and parameters of
the subcommand set, view the
cset help:
cset help set After the cpuset is initialized, the subcommand
proc can start processes on certain cpusets
with the --exec option. The following will
start the application fastapp within the cpuset
new_set:
cset proc --exec --set new_set fastapp To move an already running process inside an already existing
cpuset use the option --move. It accepts a
comma-separated list and range specifications of PIDs. The
following command will move processes with PID
2442 and within range of 3000 to 3200 into the
cpuset new_set:
cset proc --move 2442,3000-3200 new_set Listing processes running within a specific cpuset can be
done by using the option --list.
cset proc --list new_set The subcommand proc can also move the entire
list of processes within one cpuset to another cpuset by using the
option --fromset and
--toset. This will move all process assigned to
old_set and assign them to
new_set:
cset proc --move --fromset old_set \
--toset new_set For more detailed information about options and parameters of
the subcommand proc, view the help:
cset help proc5 Setting Real-time Attributes of a Process with
chrt #
Use the chrt command to manipulate the
real-time attributes of an already running process (like
scheduling policy and priority), or to execute a new process with
specified real-time attributes.
It is highly recommended for applications which do not use
real-time specific attributes by their own but should nevertheless
experience the full advantages of real-time. To get full real-time
experiences, call these applications with the
chrt command and the right set of scheduler
policy and priority parameters.
With the following command, all running processes with their
real-time specific attributes are shown. The selection class
shows the current scheduler policy and rtprio
the real-time priority:
ps -eo pid,tid,class,rtprio,comm
...
1437 1437 FF 40 fastapp The truncated example above shows the
fastapp process with PID
1437 running and with scheduler policy SCHED_FIFO and priority
40. Scheduler policy abbreviations are:
TS - SCHED_OTHERFF - SCHED_FIFORR - SCHED_RR
It is also possible to get the current scheduler policy and
priority of single processes by specifying the PID of the process
with the -p parameter. For example:
chrt -p 1437 Scheduler policies have different minimum and maximum
priority values. Minimum and maximum values for each available
scheduler policy can be retrieved with chrt:
chrt -m To change the scheduler policy and the priority of a running
process, chrt provides the options
--fifo for SCHED_FIFO,
--rr for SCHED_RR and
--other for SCHED_OTHER.
The following example will change the scheduler policy to
SCHED_FIFO with priority 42 for PID 1437:
chrt --fifo -p 42 1437Handle changing of real-time attributes of processes with care. Increasing the priority of certain processes can harm the entire system, depending on the behavior of the process. In some cases, this can lead to a complete system lockup or bad influence on certain devices.
For more information about chrt, see the chrt
man page with man 1 chrt.
6 Specifying a CPU Affinity with taskset #
The default behavior of the kernel is to keep a process running on the same CPU if the system load is balanced over the available CPUs. Otherwise, the kernel tries to improve the load balancing by moving processes to an idling CPU. In some situations, however, it is desirable to set a CPU affinity for a given process. In this case, the kernel will not move the process away from the selected CPUs. For example, if you use shielding, the shielded CPUs will not run any processes that do not have an affinity to the shielded CPUs. Another possibility to remove load from the other CPUs is to run all low priority tasks on a selected CPU.
If a task is running inside a specific cpuset, the affinity
dialog must match at least one of the CPUs available in this set.
The taskset command will not move a process
outside the cpuset it is running in.
To set or retrieve the CPU affinity of a task a bitmask is
used. It is represented by a hexadecimal number. If you count the
bits of this bitmask, the lowest bit represents the first logical
CPU as they are found in /proc/cpuinfo. For
example:
0x00000001is processor #0.
0x00000002is processor #1.
0x00000003is processor #0 and processor #1.
0xFFFFFFFEall but the first CPU.
If a given dialog does not contain any valid CPU on the system, an error is returned. If taskset returns without an error, the given program has been scheduled to the specified list of CPUs.
The command taskset starts a new process with a
given CPU affinity, or to redefine the
CPU affinity of an already running process.
taskset -p PIDRetrieves the current CPU affinity of the process with PID pid.
taskset -p maskPIDSets the CPU affinity of the process with the PID to mask.
taskset maskcommandRuns command with a CPU affinity of
mask.
For more detailed information about
taskset, see the man page man 1 taskset.
7 Changing I/O Priorities with ionice #
Handling I/O is one of the critical issues for all high-performance systems. If a task has lots of CPU power available, but must wait for the disk, it will not work as efficiently as it could. The Linux kernel provides three different scheduling classes to determine the I/O handling for a process. All of these classes can be fine-tuned with a nice level.
- The Best Effort Scheduler
The Best Effort scheduler is the default I/O scheduler, and is used for all processes that do not specify a different I/O scheduler class. By default, this scheduler sets its nice level according to the nice value of the running process.
There are eight different nice levels available for this scheduler. The lowest priority is represented by a nice level of seven, the highest priority is zero.
This scheduler has the scheduling class number 2.
- The Real Time Scheduler
The real-time I/O class always gets the highest priority for disk access. The other schedulers will only be served if no real-time request is present. This scheduling class may easily lock up the system if not implemented with care.
The real-time scheduler defines nice levels (similar to the Best Effort scheduler).
This scheduler has the scheduling class number 1.
- The Idle Scheduler
The Idle scheduler does not define any nice levels. I/O is only done in this class if no other scheduler runs an I/O request. This scheduler has the lowest available priority and can be used for processes that are not time-critical.
This scheduler has the scheduling class number 3.
To change I/O schedulers and nice values, use the
ionice command. This provides a means to tune
the scheduler of already running processes, or to start new
processes with specific I/O settings.
ionice -c3 -p$$Sets the scheduler of the current shell to
Idle.ioniceWithout additional parameters, this prints the I/O scheduler settings of the current shell.
ionice -c1 -p42 -n2Sets the scheduler of the process with process id 42 to
Real Time, and its nice value to 2.ionice -c3 /bin/bashStarts the Bash shell with the
IdleI/O scheduler.
For more detailed information about
ionice, see the ionice man
page with man 1 ionice
8 Changing the I/O Scheduler for Block Devices #
The Linux kernel provides several block device schedulers
that can be selected individually for each block device. All but
the noop scheduler perform a kind of ordering
of requested blocks to reduce head movements on the hard disk. If
you use an external storage system that has its own scheduler, you
should disable the Linux internal reordering by selecting the
noop scheduler.
- noop
The noop scheduler is a very simple scheduler that performs basic merging and sorting on I/O requests. This scheduler is mainly used for specialized environments that run their own schedulers optimized for the used hardware, such as storage systems or hardware RAID controllers.
- deadline
The main point of deadline scheduling is to try hard to answer a request before a given deadline. This results in very good I/O for a random single I/O in real-time environments.
In principle, the deadline scheduler uses two lists with all requests. One is sorted by block sequences to reduce seeking latencies, the other is sorted by expire times for each request. Normally, requests are served according to the block sequence, but if a request reaches its deadline, the scheduler starts to work on this request.
- cfq
The Completely Fair Queuing scheduler uses a separate I/O queue for each process. All of these queues get a similar time slice for disk access. With this procedure, the CFQ tries to divide the bandwidth evenly between all requesting processes. This scheduler has a similar throughput as the anticipatory scheduler, but the maximum latency is much shorter.
For the average system this scheduler yields the best results, and thus is the default I/O scheduler on SUSE Linux Enterprise systems.
To print the current scheduler of a block device like
/dev/sda, use the following command:
cat /sys/block/sda/queue/scheduler
noop deadline [cfq] In this case, the scheduler for /dev/sda
is set to cfq, the Completely Fair
Queuing scheduler. This is the default scheduler on
SUSE Linux Enterprise Real Time.
To change the schedulers, echo one of the names
noop, deadline, or
cfq into
/sys/block/<device>/scheduler. For
example, if you want to set the I/O scheduler of the device
/dev/sda to noop, use
the following command:
echo "noop" > /sys/block/sda/queue/scheduler To set other variables in the /sys file
system, use a similar approach.
9 Tuning the Block Device I/O Scheduler #
All schedulers, except for the noop
scheduler, have several common parameters that may be tuned for
each block device. You can access these parameters with
sysfs in the
/sys/block/<device>/queue/iosched/
directory. The following parameters are tuneable for the
respective scheduler:
- Anticipatory Scheduler
read_batch_expireIf write requests are scheduled, this is the time in milliseconds that reads are served before pending writes get a time slice. If writes are more important than reads, set this value lower than
read_expire.write_batch_expireSimilar to
read_batch_expirefor write requests.
- Deadline Scheduler
read_expireThe main focus of this scheduler is to limit the start latency for a request to a given time. Therefore, for each request, a deadline is calculated from the current time plus the value of
read_expirein milliseconds.write_expireSimilar to
read_expirefor write requests.fifo_batchIf a request hits its deadline, it is necessary to move the request from the sorted I/O scheduler list to the dispatch queue. The variable
fifo_batchcontrols how many requests are moved, depending on the cost of each request.front_mergesThe scheduler normally tries to find contiguous I/O requests and merges them. There are two kinds of merges: The new I/O request may be in front of the existing I/O request (front merge), or it may follow behind the existing request (back merge). Most merges are back merges. Therefore, you can disable the front merge functionality by setting
front_mergesto0.write_starvedIn case some read or write requests hit their deadline, the scheduler prefers the read requests by default. To prevent write requests from being postponed forever, the variable
write_starvedcontrols how often read requests are preferred until write requests are preferred over read requests.
- CFQ Scheduler
back_seek_maxandback_seek_penaltyThe CFQ scheduler normally uses a strict ascending elevator. When needed, it also allows small backward seeks, but it puts some penalty on them. The maximum backward sector seek is defined with
back_seek_max, and the multiplier for the penalty is set byback_seek_penalty.fifo_expire_asyncandfifo_expire_syncThe
fifo_expire_*variables define the timeout in milliseconds for asynchronous and synchronous I/O requests. To prefer synchronous operations over asynchronous ones,fifo_expire_syncvalue should be lower than fifo_expire_async.quantumDefines number of I/O requests to be dispatched at once by the block device. This parameter is used for synchronous requests.
slice_async,slice_async_rq,slice_sync, andslice_idleThese variables define the time slices a block device gets for synchronous or asynchronous operations.
slice_asyncandslice_syncserve as a base value in milliseconds for asynchronous or synchronous disk slice length calculations.slice_async_rqfor how many requests can an asynchronous disk slice accommodate.slice_idledefines how long I/O scheduler idles before servicing next thread.
The system default Block Device I/O Scheduler can be also set
by the kernel parameter elevator=. For example,
elevator=deadline changes the I/O Scheduler
to deadline.
10 For More Information #
A lot of information about real-time implementations and administration can be found on the Internet. The following list contains several selected links:
More detailed information about the real-time Linux development and an introduction how to write a real-time application can be found in the real-time Linux community Wiki. http://rt.wiki.kernel.org, http://rt.wiki.kernel.org/index.php/HOWTO:_Build_an_RT-application
The
cpusetfeature of the kernel is explained in/usr/src/linux/Documentation/cgroups/cpusets.txt. More detailed documentation is available from http://lwn.net/Articles/127936/. -->For more information about the deadline I/O scheduler, refer to https://en.wikipedia.org/wiki/Deadline_scheduler. In your installed system, find further information in
/usr/src/linux/Documentation/block/deadline-iosched.txt.The CFQ I/O scheduler is covered in detail in http://en.wikipedia.org/wiki/CFQ and
/usr/src/linux/Documentation/block/cfq-iosched.txt.