6 Hardware-based performance monitoring with Perf #
Perf is an interface to access the performance monitoring unit (PMU) of a processor and to record and display software events such as page faults. It supports system-wide, per-thread, and KVM virtualization guest monitoring.
You can store resulting information in a report. This report contains information about, for example, instruction pointers or what code a thread was executing.
Perf consists of two parts:
- Code integrated into the Linux kernel that instructs the hardware. 
- The - perfuser space utility that allows you to use the kernel code and helps you analyze gathered data.
6.1 Hardware-based monitoring #
Performance monitoring means collecting information related to how an application or system performs. This information can be obtained either through software-based means or from the CPU or chipset. Perf integrates both of these methods.
Many modern processors contain a performance monitoring unit (PMU). The design and functionality of a PMU is CPU-specific. For example, the number of registers, counters and features supported varies by CPU implementation.
Each PMU model consists of a set of registers: the performance monitor configuration (PMC) and the performance monitor data (PMD). Both can be read, but only PMCs are writable. These registers store configuration information and data.
6.2 Sampling and counting #
Perf supports several profiling modes:
- Counting. Count the number of occurrences of an event. 
- Event-based sampling. A less exact way of counting: A sample is recorded whenever a certain threshold number of events has occurred. 
- Time-based sampling. A less exact way of counting: A sample is recorded in a defined frequency. 
- Instruction-based sampling (AMD64 only). The processor follows instructions appearing in a given interval and samples which events they produce. This allows following up on individual instructions and seeing which of them is critical to performance. 
6.3 Installing Perf #
The Perf kernel code is already included with the default kernel. To be able to use the user space utility, install the package perf.
6.4 Perf subcommands #
   To gather the required information, the perf tool has
   several subcommands. This section gives an overview of the most often used
   commands.
  
   To see help in the form of a man page for any of the subcommands, use either
   perf helpSUBCOMMAND
   or
   man perf-SUBCOMMAND.
  
- perf stat
- Start a program and create a statistical overview that is displayed after the program quits. - perf statis used to count events.
- perf record
- Start a program and create a report with performance counter information. The report is stored as - perf.datain the current directory.- perf recordis used to sample events.
- perf report
- Display a report that was previously created with - perf record.
- perf annotate
- Display a report file and an annotated version of the executed code. If debug symbols are installed, the source code is also displayed. 
- perf list
- List event types that Perf can report with the current kernel and with your CPU. You can filter event types by category. For example, to see hardware events only, use - perf list hw.- The man page for - perf_event_openhas short descriptions for the most important events. For example, to find a description of the event- branch-misses, search for- BRANCH_MISSES(note the spelling differences):- >- manperf_event_open |- grep-A5 BRANCH_MISSES- Sometimes, events may be ambiguous. The lowercase hardware event names are not the names of raw hardware events but instead the names of aliases created by Perf. These aliases map to differently named but similarly defined hardware events on each supported processor. - For example, the - cpu-cyclesevent is mapped to the hardware event- UNHALTED_CORE_CYCLESon Intel processors. On AMD processors, however, it is mapped to the hardware event- CPU_CLK_UNHALTED.- Perf also allows measuring raw events specific to your hardware. To look up their descriptions, see the Architecture Software Developer's Manual of your CPU vendor. The relevant documents for AMD64/Intel 64 processors are linked to in Section 6.7, “More information”. 
- perf top
- Display system activity as it happens. 
- perf trace
- This command behaves similarly to - strace. With this subcommand, you can see which system calls are executed by a particular thread or process and which signals it receives.
6.5 Counting particular types of event #
   To count the number of occurrences of an event, such as those displayed by
   perf list, use:
  
#perfstat -e EVENT -a
   To count multiple types of events at once, list them separated by commas.
   For example, to count cpu-cycles and
   instructions, use:
  
#perfstat -e cpu-cycles,instructions -a
To stop the session, press Ctrl–C.
You can also count the number of occurrences of an event within a particular time:
#perfstat -e EVENT -a -- sleep TIME
Replace TIME by a value in seconds.
6.6 Recording events specific to particular commands #
There are several ways to sample events specific to a particular command:
- To create a report for a newly invoked command, use: - #- perfrecord COMMAND- Then, use the started process normally. When you quit the process, the Perf session also stops. 
- To create a report for the entire system while a newly invoked command is running, use: - #- perfrecord -a COMMAND- Then, use the started process normally. When you quit the process, the Perf session also stops. 
- To create a report for an already running process, use: - #- perfrecord -p PID- Replace PID with a process ID. To stop the session, press Ctrl–C. 
   Now you can view the gathered data (perf.data)
   using:
  
>perfreport
This opens a pseudo-graphical interface. To receive help, press H. To quit, press Q.
If you prefer a graphical interface, try the GTK+ interface of Perf:
>perfreport --gtk
However, the GTK+ interface is limited in functionality.
6.7 More information #
This chapter only provides a short overview. Refer to the following links for more information:
- https://perf.wiki.kernel.org/index.php/Main_Page
- The project home page. It also features a tutorial on using - perf.
- https://www.brendangregg.com/perf.html
- Unofficial page with many one-line examples of how to use - perf.
- https://web.eece.maine.edu/~vweaver/projects/perf_events/
- Unofficial page with several resources, primarily relating to the Linux kernel code of Perf and its API. This page includes, for example, a CPU compatibility table and a programming guide. 
- https://www-ssl.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.pdf
- The Intel Architectures Software Developer's Manual, Volume 3B. 
- https://support.amd.com/TechDocs/24593.pdf
- The AMD Architecture Programmer's Manual, Volume 2. 
- Chapter 7, OProfile—system-wide profiler
- Consult this chapter for other performance optimizations.