Analyzing CPU Performance in Linux: A Comprehensive Guide
Introduction
Understanding and optimizing CPU performance is crucial for maintaining the efficiency of Linux systems. This guide will provide insights into basic commands and methodologies to analyze and optimize CPU performance.
Using the top Command to Analyze CPU Load
One of the most fundamental tools for monitoring system performance in Linux is the top command. Here’s an updated example of the top command output:
top - 18:07:42 up 5 days, 18:20, 2 users, load average: 0.65, 0.35, 0.23 Tasks: 197 total, 3 running, 194 sleeping, 0 stopped, 0 zombie %Cpu(s): 37.0 us, 3.7 sy, 0.0 ni, 58.1 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st MiB Mem : 7649.3 total, 1286.3 free, 2084.5 used, 4278.5 buff/cache MiB Swap: 4096.0 total, 4095.7 free, 0.3 used. 5160.2 avail Mem
One important metric to consider is the load average, which represents the average system load over the last 1, 5, and 15 minutes. Here, the values indicate a moderate workload on the system.
Interpreting Load Averages:
- The first value shows the average CPU load over the last minute (0.65).
- The second value represents the average load over the last 5 minutes (0.35).
- The third value indicates the average load over the last 15 minutes (0.23).
Assessing CPU Allocation with lscpu
Next, it’s important to check how many CPUs are allocated to understand the load in relation to available resources:
lscpu | grep CPU CPU op-mode(s): 32-bit, 64-bit CPU(s): 2 On-line CPU(s) list: 0,1
If the load average approaches or exceeds the number of CPUs, which in this case is 2, the server may experience slowness or lag. Here we see the load average is 0.65 which is still lower than 2. So, the server is not busy yet.
Analyzing CPU Utilization
Analyze the following section of the top command output to understand why the system might be experiencing increased load:
%Cpu(s): 37.0 us, 3.7 sy, 0.0 ni, 58.1 id, 0.0 wa, 0.0 hi, 1.2 si, 0.0 st
- us (user): This indicates the percentage of CPU time that is being used by user processes (i.e., processes not associated with the operating system). In this case,
37.0 us
means that 37.0% of the CPU is being used by user processes. This is typically code executed by user-installed applications. - sy (system): This is the percentage of CPU time spent on system (kernel) tasks. System tasks are processes that are part of the operating system itself, rather than user-installed applications.
3.7 sy
indicates that 3.7% of the CPU’s capacity is being used to run system tasks. - ni (nice): This represents the percentage of CPU utilized by low-priority (niced) processes. A
nice
value is a user-set priority that affects process scheduling.0.0 ni
suggests that no CPU time is being used by niced processes at this moment. - id (idle): This shows the percentage of time the CPU is not being used by any processes.
58.1 id
indicates that 58.1% of the CPU’s potential is not being used at the time of reporting. A high idle percentage generally means that the system is not very busy. - wa (I/O wait): This would show the amount of CPU time waiting for I/O operations to complete, such as disk reads or writes, which are not happening in this snapshot as indicated by
0.0 wa
. This metric is crucial for diagnosing I/O bottlenecks where the CPU has to wait, affecting performance.
With the CPU showing 37% usage by user processes and 58.1% idle, it suggests that while there is moderate activity, there is still substantial idle capacity available for additional loads. However, if system (sy) or I/O wait(wa) was high, this could indicate system or I/O issue. In our case, the kernel space usage at 3.7% (identified by 3.7 sy) and I/O Wait (wa) 0 do not suggest such issues.
Even if the CPU has a high idle percentage (id
), a high I/O wait percentage (wa
) can still cause delays and impact the user experience.
Here is an example of a high read/write scenario where the Disk is not first enough:
%Cpu(s): 15.0 us, 5.0 sy, 0.0 ni, 60.0 id, 20.0 wa
- 15.0 us: 15% of the CPU time is spent on user processes.
- 5.0 sy: 5% of the CPU time is spent on system processes.
- 60.0 id: 60% of the CPU time is idle.
- 20.0 wa: 20% of the CPU time is spent waiting for I/O operations.
wa
percentage (e.g., > 5-10%), especially during normal operations, this is a stronger indicator of potential I/O bottlenecks that could affect performance.Identifying Resource-Intensive Processes
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 51420 apache 20 0 397908 71520 17504 S 8.6 0.9 6:31.21 php-fpm 51421 apache 20 0 398128 73736 19496 S 8.6 0.9 6:33.04 php-fpm ... (additional processes)
Processes such as php-fpm run by the Apache user are significantly consuming CPU resources. This could be a focal point for performance optimization, especially if these processes consistently show high CPU usage.
Other than the top command you can check the top CPU-intensive processes by the following command
sudo ps -eo pid,user,ppid,cmd,%mem,%cpu --sort=-%cpu | head PID USER PPID CMD %MEM %CPU 50383 root 50381 /usr/libexec/oracle-cloud-a 3.8 0.7 52386 mysql 1 /usr/libexec/mysqld --based 6.9 0.2 1 root 0 /usr/lib/systemd/systemd -- 0.2 0.0 ..........................................................