4 SystemTap—filtering and analyzing system data #
  SystemTap provides a command line interface and a scripting language to
  examine the activities of a running Linux system, particularly the kernel,
  in fine detail. SystemTap scripts are written in the SystemTap scripting
  language, are then compiled to C-code kernel modules and inserted into the
  kernel. The scripts can be designed to extract, filter and summarize data,
  thus allowing the diagnosis of complex performance problems or functional
  problems. SystemTap provides information similar to the output of tools
  like netstat, ps,
  top, and iostat. However, more
  filtering and analysis options can be used for the collected information.
 
4.1 Conceptual overview #
   Each time you run a SystemTap script, a SystemTap session is started.
   Several passes are done on the script before it is allowed to run.
   Then, the script is compiled into a kernel module and loaded. If the
   script has been executed before and no system components have changed
   (for example, different compiler or kernel versions, library paths, or
   script contents), SystemTap does not compile the script again. Instead,
   it uses the *.c and *.ko data
   stored in the SystemTap cache (~/.systemtap).
  
The module is unloaded when the tap has finished running. For an example, see the test run in Section 4.2, “Installation and setup” and the respective explanation.
4.1.1 SystemTap scripts #
    SystemTap usage is based on SystemTap scripts
    (*.stp). They tell SystemTap which type of
    information to collect, and what to do once that information is
    collected. The scripts are written in the SystemTap scripting language
    that is similar to AWK and C. For the language definition, see
    https://sourceware.org/systemtap/langref/. A lot of
    useful example scripts are available from
    http://www.sourceware.org/systemtap/examples/.
   
    The essential idea behind a SystemTap script is to name
    events, and to give them handlers.
    When SystemTap runs the script, it monitors for certain events. When an
    event occurs, the Linux kernel runs the handler as a sub-routine, then
    resumes. Thus, events serve as the triggers for handlers to run.
    Handlers can record specified data and print it in a certain manner.
   
The SystemTap language only uses a few data types (integers, strings, and associative arrays of these), and full control structures (blocks, conditionals, loops, functions). It has a lightweight punctuation (semicolons are optional) and does not need detailed declarations (types are inferred and checked automatically).
    For more information about SystemTap scripts and their syntax, refer to
    Section 4.3, “Script syntax” and to the
    stapprobes and stapfuncs man
    pages, that are available with the
    systemtap-docs package.
   
4.1.2 Tapsets #
    
    Tapsets are a library of pre-written probes and functions that can be
    used in SystemTap scripts. When a user runs a SystemTap script,
    SystemTap checks the script's probe events and handlers against the
    tapset library. SystemTap then loads the corresponding probes and
    functions before translating the script to C. Like SystemTap scripts
    themselves, tapsets use the file name extension
    *.stp.
   
However, unlike SystemTap scripts, tapsets are not meant for direct execution. They constitute the library from which other scripts can pull definitions. Thus, the tapset library is an abstraction layer designed to make it easier for users to define events and functions. Tapsets provide aliases for functions that users could want to specify as an event. Knowing the proper alias is often easier than remembering specific kernel functions that might vary between kernel versions.
4.1.3 Commands and privileges #
    The main commands associated with SystemTap are stap
    and staprun. To execute them, you either need
    root privileges or must be a member of the
    stapdev or
    stapusr group.
   
- stap
- SystemTap front-end. Runs a SystemTap script (either from file, or from standard input). It translates the script into C code, compiles it, and loads the resulting kernel module into a running Linux kernel. Then, the requested system trace or probe functions are performed. 
- staprun
- SystemTap back-end. Loads and unloads kernel modules produced by the SystemTap front-end. 
    For a list of options for each command, use --help. For
    details, refer to the stap and the
    staprun man pages.
   
 
    To avoid giving root access to users solely to enable them to work
    with SystemTap, use one of the following SystemTap groups. They are not available
    by default on SUSE Linux Enterprise Server, but you can create the groups and modify the
    access rights accordingly.  Also adjust the permissions of the
    staprun command if the security implications are
    appropriate for your environment.
   
- stapdev
- Members of this group can run SystemTap scripts with - stap, or run SystemTap instrumentation modules with- staprun. As running- stapinvolves compiling scripts into kernel modules and loading them into the kernel, members of this group still have effective- rootaccess.
- stapusr
- Members of this group are only allowed to run SystemTap instrumentation modules with - staprun. In addition, they can only run those modules from- /lib/modules/KERNEL_VERSION/systemtap/. This directory must be owned by- rootand must only be writable for the- rootuser.
4.1.4 Important files and directories #
The following list gives an overview of the SystemTap main files and directories.
- /lib/modules/KERNEL_VERSION/systemtap/
- Holds the SystemTap instrumentation modules. 
- /usr/share/systemtap/tapset/
- Holds the standard library of tapsets. 
- /usr/share/doc/packages/systemtap/examples
- Holds several example SystemTap scripts for various purposes. Only available if the - systemtap-docspackage is installed.
- ~/.systemtap/cache
- Data directory for cached SystemTap files. 
- /tmp/stap*
- Temporary directory for SystemTap files, including translated C code and kernel object. 
4.2 Installation and setup #
   As SystemTap needs information about the kernel, some additional
   kernel-related packages must be installed. For each kernel you want to
   probe with SystemTap, you need to install a set of the following
   packages. This set should exactly match the kernel version and flavor
   (indicated by * in the overview below).
  
    If you subscribed your system for online updates, you can find
    “debuginfo” packages in the
    *-Debuginfo-Updates online installation repository
    relevant for SUSE Linux Enterprise Server 15 SP3. Use YaST to
    enable the repository.
   
   For the classic SystemTap setup, install the following packages (using
   either YaST or zypper).
  
- systemtap
- systemtap-server
- systemtap-docs(optional)
- kernel-*-base
- kernel-*-debuginfo
- kernel-*-devel
- kernel-source-*
- gcc
   To get access to the man pages and to a helpful collection of example
   SystemTap scripts for various purposes, additionally install the
   systemtap-docs package.
  
   To check if all packages are correctly installed on the machine and if
   SystemTap is ready to use, execute the following command as
   root.
  
# stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'It probes the currently used kernel by running a script and returning an output. If the output is similar to the following, SystemTap is successfully deployed and ready to use:
Pass 1: parsed user script and 59 library script(s) in 80usr/0sys/214real ms. Pass 2: analyzed script: 1 probe(s), 11 function(s), 2 embed(s), 1 global(s) in 140usr/20sys/412real ms. Pass 3: translated to C into "/tmp/stapDwEk76/stap_1856e21ea1c246da85ad8c66b4338349_4970.c" in 160usr/0sys/408real ms. Pass 4: compiled C into "stap_1856e21ea1c246da85ad8c66b4338349_4970.ko" in 2030usr/360sys/10182real ms. Pass 5: starting run. read performed Pass 5: run completed in 10usr/20sys/257real ms.
| 
     Checks the script against the existing tapset library in
      | |
| Examines the script for its components. | |
| 
     Translates the script to C. Runs the system C compiler to create a
     kernel module from it. Both the resulting C code
     ( | |
| 
     Loads the module and enables all the probes (events and handlers) in
     the script by hooking into the kernel. The event being probed is a
     Virtual File System (VFS) read. As the event occurs on any processor, a
     valid handler is executed (prints the text  | |
| After the SystemTap session is terminated, the probes are disabled, and the kernel module is unloaded. | 
In case any error messages appear during the test, check the output for hints about any missing packages and make sure they are installed correctly. Rebooting and loading the appropriate kernel may also be needed.
4.3 Script syntax #
SystemTap scripts consist of the following two components:
- SystemTap events (probe points)
- Name the kernel events at the associated handler should be executed. Examples for events are entering or exiting a certain function, a timer expiring, or starting or terminating a session. 
- SystemTap handlers (probe body)
- Series of script language statements that specify the work to be done whenever a certain event occurs. This normally includes extracting data from the event context, storing them into internal variables, or printing results. 
   An event and its corresponding handler is collectively called a
   probe. SystemTap events are also called probe
   points. A probe's handler is also called a probe
   body.
  
   Comments can be inserted anywhere in the SystemTap script in various
   styles: using either #, /* */, or
   // as marker.
  
4.3.1 Probe format #
A SystemTap script can have multiple probes. They must be written in the following format:
probe EVENT {STATEMENTS}
    Each probe has a corresponding statement block. This statement block
    must be enclosed in { } and contains the statements
    to be executed per event.
   
The following example shows a simple SystemTap script.
probe1 begin2 {3 printf4 ("hello world\n")5 exit ()6 }7
| Start of the probe. | |
| 
       Event  | |
| 
       Start of the handler definition, indicated by  | |
| 
       First function defined in the handler: the  | |
| 
       String to be printed by the  | |
| 
       Second function defined in the handler: the  | |
| 
       End of the handler definition, indicated by  | 
     The event begin
     2
     (the start of the SystemTap session) triggers the handler enclosed in
     { }. Here, that is the printf
     function
     4.
     In this case, it prints hello world followed by a
     new line
     5.
     Then, the script exits.
    
If your statement block holds several statements, SystemTap executes these statements in sequence—you do not need to insert special separators or terminators between multiple statements. A statement block can also be nested within another statement blocks. Generally, statement blocks in SystemTap scripts use the same syntax and semantics as in the C programming language.
4.3.2 SystemTap events (probe points) #
SystemTap supports several built-in events.
    The general event syntax is a dotted-symbol sequence. This allows a
    breakdown of the event namespace into parts. Each component identifier
    may be parameterized by a string or number literal, with a syntax like a
    function call. A component may include a * character,
    to expand to other matching probe points. A probe point may be followed
    by a ? character, to indicate that it is optional,
    and that no error should result if it fails to expand.
    
    Alternately, a probe point may be followed by a !
    character to indicate that it is both optional and sufficient.
   
    SystemTap supports multiple events per probe—they need to be
    separated by a comma (,). If multiple events are
    specified in a single probe, SystemTap will execute the handler when any
    of the specified events occur.
   
In general, events can be classified into the following categories:
- Synchronous events: Occur when any process executes an instruction at a particular location in kernel code. This gives other events a reference point (instruction address) from which more contextual data may be available. - An example for a synchronous event is - vfs.FILE_OPERATION: The entry to the FILE_OPERATION event for Virtual File System (VFS). For example, in Section 4.2, “Installation and setup”,- readis the FILE_OPERATION event used for VFS.
- Asynchronous events: Not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs. - Examples for asynchronous events are: - begin(start of a SystemTap session—when a SystemTap script is run,- end(end of a SystemTap session), or timer events. Timer events specify a handler to be executed periodically, like- example timer.s(SECONDS), or- timer.ms(MILLISECONDS).- When used together with other probes that collect information, timer events allow you to print periodic updates and see how that information changes over time. 
For example, the following probe would print the text “hello world” every 4 seconds:
probe timer.s(4)
{
   printf("hello world\n")
}
    For detailed information about supported events, refer to the
    stapprobes man page. The See
    Also section of the man page also contains links to other
    man pages that discuss supported events for specific subsystems and
    components.
   
4.3.3 SystemTap handlers (probe body) #
Each SystemTap event is accompanied by a corresponding handler defined for that event, consisting of a statement block.
4.3.3.1 Functions #
     If you need the same set of statements in multiple probes, you can
     place them in a function for easy reuse. Functions are defined by the
     keyword function followed by a name. They take any
     number of string or numeric arguments (by value) and may return a
     single string or number.
    
function FUNCTION_NAME(ARGUMENTS) {STATEMENTS}
probe EVENT {FUNCTION_NAME(ARGUMENTS)}The statements in FUNCTION_NAME are executed when the probe for EVENT executes. The ARGUMENTS are optional values passed into the function.
Functions can be defined anywhere in the script. They may take any
     One of the functions needed very often was already introduced in
     Example 4.1, “Simple SystemTap script”: the printf
     function for printing data in a formatted way. When using the
     printf function, you can specify how arguments
     should be printed by using a format string. The format string is
     included in quotation marks and can contain further format specifiers,
     introduced by a % character.
    
Which format strings to use depends on your list of arguments. Format strings can have multiple format specifiers—each matching a corresponding argument. Multiple arguments can be separated by a comma.
printf Function with format specifiers #
     The example above prints the current executable name
     (execname()) as a string and the process ID
     (pid()) as an integer in brackets. Then, a space,
     the word open and a line break follow:
    
[...] vmware-guestd(2206) open hald(2360) open [...]
     Apart from the two functions execname()and
     pid()) used in
     Example 4.3, “printf Function with format specifiers”, a variety of other
     functions can be used as printf arguments.
    
Among the most commonly used SystemTap functions are the following:
- tid()
- ID of the current thread. 
- pid()
- Process ID of the current thread. 
- uid()
- ID of the current user. 
- cpu()
- Current CPU number. 
- execname()
- Name of the current process. 
- gettimeofday_s()
- Number of seconds since Unix epoch (January 1, 1970). 
- ctime()
- Convert time into a string. 
- pp()
- String describing the probe point currently being handled. 
- thread_indent()
- Useful function for organizing print results. It (internally) stores an indentation counter for each thread ( - tid()). The function takes one argument, an indentation delta, indicating how many spaces to add or remove from the thread's indentation counter. It returns a string with some generic trace data along with an appropriate number of indentation spaces. The generic data returned includes a time stamp (number of microseconds since the initial indentation for the thread), a process name, and the thread ID itself. This allows you to identify what functions were called, who called them, and how long they took.- Call entries and exits often do not immediately precede each other (otherwise it would be easy to match them). In between a first call entry and its exit, usually other call entries and exits are made. The indentation counter helps you match an entry with its corresponding exit as it indents the next function call in case it is not the exit of the previous one. 
     For more information about supported SystemTap functions, refer to the
     stapfuncs man page.
    
4.3.3.2 Other basic constructs #
     Apart from functions, you can use other common constructs in
     SystemTap handlers, including variables, conditional statements (like
     if/else, while
     loops, for loops, arrays or command line arguments.
    
4.3.3.2.1 Variables #
Variables may be defined anywhere in the script. To define one, simply choose a name and assign a value from a function or expression to it:
foo = gettimeofday( )
      Then you can use the variable in an expression. From the type of
      values assigned to the variable, SystemTap automatically infers the
      type of each identifier (string or number). Any inconsistencies will
      be reported as errors. In the example above, foo
      would automatically be classified as a number and could be printed via
      printf() with the integer format specifier
      (%d).
     
      However, by default, variables are local to the probe they are used
      in: They are initialized, used and disposed of at each handler
      evocation. To share variables between probes, declare them global
      anywhere in the script. To do so, use the global
      keyword outside of the probes:
     
global count_jiffies, count_ms
probe timer.jiffies(100) { count_jiffies ++ }
probe timer.ms(100) { count_ms ++ }
probe timer.ms(12345)
{
  hz=(1000*count_jiffies) / count_ms
  printf ("jiffies:ms ratio %d:%d => CONFIG_HZ=%d\n",
    count_jiffies, count_ms, hz)
  exit ()
  }
       This example script computes the CONFIG_HZ setting of the kernel by
       using timers that count jiffies and milliseconds, then computing
       accordingly. (A jiffy is the duration of one tick of the system timer
       interrupt. It is not an absolute time interval unit, since its
       duration depends on the clock interrupt frequency of the particular
       hardware platform). With the global statement it
       is possible to use the variables count_jiffies and
       count_ms also in the probe
       timer.ms(12345). With ++ the
       value of a variable is incremented by 1.
      
4.3.3.2.2 Conditional statements #
There are several conditional statements that you can use in SystemTap scripts. The following are probably the most common:
- If/else statements
- They are expressed in the following format: - if (CONDITION)1STATEMENT12 else3STATEMENT24 - The - ifstatement compares an integer-valued expression to zero. If the condition expression 1 is non-zero, the first statement 2 is executed. If the condition expression is zero, the second statement 4 is executed. The else clause (3 and 4) is optional. Both 2 and 4 can also be statement blocks.
- While loops
- They are expressed in the following format: - while (CONDITION)1STATEMENT2 - As long as - conditionis non-zero, the statement 2 is executed. 2 can also be a statement block. It must change a value so- conditionwill eventually be zero.
- For loops
- They are a shortcut for - whileloops and are expressed in the following format:- for (INITIALIZATION1; CONDITIONAL2; INCREMENT3) statement - The expression specified in 1 is used to initialize a counter for the number of loop iterations and is executed before execution of the loop starts. The execution of the loop continues until the loop condition 2 is false. (This expression is checked at the beginning of each loop iteration). The expression specified in 3 is used to increment the loop counter. It is executed at the end of each loop iteration. 
- Conditional operators
- The following operators can be used in conditional statements: - ==: Is equal to - !=: Is not equal to - >=: Is greater than or equal to - <=: Is less than or equal to 
4.4 Example script #
   If you have installed the
   systemtap-docs package, you can
   find several useful SystemTap example scripts in
   /usr/share/doc/packages/systemtap/examples.
  
   This section describes a rather simple example script in more detail:
   /usr/share/doc/packages/systemtap/examples/network/tcp_connections.stp.
  
tcp_connections.stp ##! /usr/bin/env stap
probe begin {
  printf("%6s %16s %6s %6s %16s\n",
         "UID", "CMD", "PID", "PORT", "IP_SOURCE")
}
probe kernel.function("tcp_accept").return?,
      kernel.function("inet_csk_accept").return? {
  sock = $return
  if (sock != 0)
    printf("%6d %16s %6d %6d %16s\n", uid(), execname(), pid(),
           inet_get_local_port(sock), inet_get_ip_source(sock))
}This SystemTap script monitors the incoming TCP connections and helps to identify unauthorized or unwanted network access requests in real time. It shows the following information for each new incoming TCP connection accepted by the computer:
- User ID ( - UID)
- Command accepting the connection ( - CMD)
- Process ID of the command ( - PID)
- Port used by the connection ( - PORT)
- IP address from which the TCP connection originated ( - IP_SOUCE)
To run the script, execute
stap /usr/share/doc/packages/systemtap/examples/network/tcp_connections.stp
and follow the output on the screen. To manually stop the script, press Ctrl–C.
4.5 User space probing #
For debugging user space applications (like DTrace can do), SUSE Linux Enterprise Server 15 SP3 supports user space probing with SystemTap: Custom probe points can be inserted in any user space application. Thus, SystemTap lets you use both kernel space and user space probes to debug the behavior of the whole system.
   To get the required utrace infrastructure and the uprobes kernel module
   for user space probing, you need to install the
   kernel-trace package in
   addition to the packages listed in
   Section 4.2, “Installation and setup”.
  
   utrace implements a framework for controlling
   user space tasks. It provides an interface that can be used by various
   tracing “engines”, implemented as loadable kernel modules.
   The engines register callback functions for specific events, then attach
   to whichever thread they want to trace. As the callbacks are made from
   “safe” places in the kernel, this allows for great leeway in
   the kinds of processing the functions can do. Various events can be
   watched via utrace, for example, system call entry and exit, fork(),
   signals being sent to the task, etc. More details about the utrace
   infrastructure are available at
   https://sourceware.org/systemtap/wiki/utrace.
  
SystemTap includes support for probing the entry into and return from a function in user space processes, probing predefined markers in user space code, and monitoring user-process events.
To check if the currently running kernel provides the needed utrace support, use the following command:
>sudogrep CONFIG_UTRACE /boot/config-`uname -r`
For more details about user space probing, refer to https://sourceware.org/systemtap/SystemTap_Beginners_Guide/userspace-probing.html.
4.6 More information #
This chapter only provides a short SystemTap overview. Refer to the following links for more information about SystemTap:
- https://sourceware.org/systemtap/
- SystemTap project home page. 
- https://sourceware.org/systemtap/wiki/
- Huge collection of useful information about SystemTap, ranging from detailed user and developer documentation to reviews and comparisons with other tools, or Frequently Asked Questions and tips. Also contains collections of SystemTap scripts, examples and usage stories and lists recent talks and papers about SystemTap. 
- https://sourceware.org/systemtap/documentation.html
- Features a SystemTap Tutorial, a SystemTap Beginner's Guide, a Tapset Developer's Guide, and a SystemTap Language Reference in PDF and HTML format. Also lists the relevant man pages. 
   You can also find the SystemTap language reference and SystemTap tutorial
   in your installed system under
   /usr/share/doc/packages/systemtap. Example SystemTap
   scripts are available from the example subdirectory.