Analyzing CPU Performance in Linux

zero comment

Linux

Analyzing CPU Performance in Linux: A Comprehensive Guide

Introduction

Understanding and optimizing CPU performance is crucial for maintaining the efficiency of Linux systems. This guide will provide insights into basic commands and methodologies to analyze and optimize CPU performance.

Using the top Command to Analyze CPU Load

One of the most fundamental tools for monitoring system performance in Linux is the top command. Here’s an updated example of the top command output:

top - 18:07:42 up 5 days, 18:20, 2 users, load average: 0.65, 0.35, 0.23
Tasks: 197 total, 3 running, 194 sleeping, 0 stopped, 0 zombie
%Cpu(s): 37.0 us, 3.7 sy, 0.0 ni, 58.1 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st
MiB Mem : 7649.3 total, 1286.3 free, 2084.5 used, 4278.5 buff/cache
MiB Swap: 4096.0 total, 4095.7 free, 0.3 used. 5160.2 avail Mem

One important metric to consider is the load average, which represents the average system load over the last 1, 5, and 15 minutes. Here, the values indicate a moderate workload on the system.

Interpreting Load Averages:

The first value shows the average CPU load over the last minute (0.65).
The second value represents the average load over the last 5 minutes (0.35).
The third value indicates the average load over the last 15 minutes (0.23).

Assessing CPU Allocation with lscpu

Next, it’s important to check how many CPUs are allocated to understand the load in relation to available resources:

lscpu | grep CPU
CPU op-mode(s): 32-bit, 64-bit
CPU(s): 2
On-line CPU(s) list: 0,1

If the load average approaches or exceeds the number of CPUs, which in this case is 2, the server may experience slowness or lag. Here we see the load average is 0.65 which is still lower than 2. So, the server is not busy yet.

Analyzing CPU Utilization

Analyze the following section of the top command output to understand why the system might be experiencing increased load:

%Cpu(s): 37.0 us, 3.7 sy, 0.0 ni, 58.1 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st

us (user): This indicates the percentage of CPU time that is being used by user processes (i.e., processes not associated with the operating system). In this case, 37.0 us means that 37.0% of the CPU is being used by user processes. This is typically code executed by user-installed applications.
sy (system): This is the percentage of CPU time spent on system (kernel) tasks. System tasks are processes that are part of the operating system itself, rather than user-installed applications. 3.7 sy indicates that 3.7% of the CPU’s capacity is being used to run system tasks.
ni (nice): This represents the percentage of CPU utilized by low-priority (niced) processes. A nice value is a user-set priority that affects process scheduling. 0.0 ni suggests that no CPU time is being used by niced processes at this moment.
id (idle): This shows the percentage of time the CPU is not being used by any processes. 58.1 id indicates that 58.1% of the CPU’s potential is not being used at the time of reporting. A high idle percentage generally means that the system is not very busy.
wa (I/O wait): This would show the amount of CPU time waiting for I/O operations to complete, such as disk reads or writes, which are not happening in this snapshot as indicated by 0.0 wa. This metric is crucial for diagnosing I/O bottlenecks where the CPU has to wait, affecting performance.

With the CPU showing 37% usage by user processes and 58.1% idle, it suggests that while there is moderate activity, there is still substantial idle capacity available for additional loads. However, if system (sy) or I/O wait(wa) was high, this could indicate system or I/O issue. In our case, the kernel space usage at 3.7% (identified by 3.7 sy) and I/O Wait (wa) 0 do not suggest such issues.

Even if the CPU has a high idle percentage (id), a high I/O wait percentage (wa) can still cause delays and impact the user experience.

Here is an example of a high read/write scenario where the Disk is not first enough:

%Cpu(s): 15.0 us, 5.0 sy, 0.0 ni, 60.0 id, 20.0 wa

15.0 us: 15% of the CPU time is spent on user processes.
5.0 sy: 5% of the CPU time is spent on system processes.
60.0 id: 60% of the CPU time is idle.
20.0 wa: 20% of the CPU time is spent waiting for I/O operations.

Even though the CPU is idle 60% of the time, the 20% I/O wait time is significant.
This high wa value suggests that a considerable amount of CPU time is spent waiting for I/O operations to complete.
As a result, processes that rely on these I/O operations are delayed, leading to a slow user experience.

If you consistently observe a high wa percentage (e.g., > 5-10%), especially during normal operations, this is a stronger indicator of potential I/O bottlenecks that could affect performance.

Identifying Resource-Intensive Processes

PID USER    PR  NI    VIRT    RES    SHR S %CPU %MEM TIME+ COMMAND
51420 apache 20  0    397908  71520  17504 S 8.6  0.9  6:31.21 php-fpm
51421 apache 20  0    398128  73736  19496 S 8.6  0.9  6:33.04 php-fpm
... (additional processes)

Processes such as php-fpm run by the Apache user are significantly consuming CPU resources. This could be a focal point for performance optimization, especially if these processes consistently show high CPU usage.

Other than the top command you can check the top CPU-intensive processes by the following command

sudo ps -eo pid,user,ppid,cmd,%mem,%cpu --sort=-%cpu | head
    PID USER        PPID CMD                         %MEM %CPU
  50383 root       50381 /usr/libexec/oracle-cloud-a  3.8  0.7
  52386 mysql          1 /usr/libexec/mysqld --based  6.9  0.2
      1 root           0 /usr/lib/systemd/systemd --  0.2  0.0
      ..........................................................

Analyzing CPU Performance in Linux