Analyzing I/O Performance in Linux

zero comment

Linux

Analyzing I/O Performance in Linux

Introduction

Before diving into I/O performance analysis, first read our guide on analyzing CPU performance in Linux. This article builds on that foundation to help you pinpoint which disks are underperforming.

Using the iostat Command

The iostat command provides detailed statistics on CPU and I/O operations. If iostat is not present, install it:

On RHEL/CentOS: sudo yum install sysstat
On Ubuntu/Debian: sudo apt-get install sysstat

Here’s an example output of iostat -x 15:

iostat -x 15
Linux 5.15.0-204.147.6.2.el8uek.x86_64 (techinfobest)   05/27/2024   _x86_64_   (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.03    0.20    0.33    0.01    0.16   98.27

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              30.00   80.00    15000.00  40000.00  1.00    2.00    5.00   10.00  1.50    3.00    1.00   500.00   500.00   1.50   50.00
sdb              25.00   60.00    12500.00  30000.00  0.50    1.00    2.00   5.00   1.70    2.80    0.90   500.00   500.00   1.70   45.00
sdc              500.00  1000.00  500000.00 1000000.00 5.00    10.00   5.00   10.00  15.00   25.00   15.00  500.00   500.00   5.00  100.00

Key Metrics to Monitor

r/s and w/s: Read and write requests per second.
rkB/s and wkB/s: Kilobytes read and written per second.
r_await and w_await: Average time in milliseconds for read and write requests.
%util: Percentage of time the device is active. A value close to 100% indicates the disk is at its capacity.

Interpreting the Data

High r_await and w_await values indicate longer wait times, suggesting potential bottlenecks. High %util means the disk is heavily utilized. If sdc shows 100% utilization, it’s operating at full capacity and might be a bottleneck. Compare the performance metrics of sda, sdb, and sdc to identify which disk has the highest read/write times and utilization.

Example Analysis

For instance:

sda: Low read/write operations but higher r_await and w_await, indicating possible latency issues.
sdb: Moderate operations with slightly higher wait times, suggesting it handles more load but still performs adequately.
sdc: Highest read/write operations and utilization, indicating it is the most heavily used disk and has reached its performance limit.

Scenario with I/O Issues

Here’s an additional example where sdc shows 100% utilization, indicating an I/O bottleneck:

iostat -x 15
Linux 5.15.0-204.147.6.2.el8uek.x86_64 (techinfobest)   05/27/2024   _x86_64_   (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.03    0.20    0.33    0.01    0.16   98.27

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              30.00   80.00    15000.00  40000.00  1.00    2.00    5.00   10.00  1.50    3.00    1.00   500.00   500.00   1.50   50.00
sdb              25.00   60.00    12500.00  30000.00  0.50    1.00    2.00   5.00   1.70    2.80    0.90   500.00   500.00   1.70   45.00
sdc              500.00  1000.00  500000.00 1000000.00 5.00    10.00   5.00   10.00  15.00   25.00   15.00  500.00   500.00   5.00  100.00

In this scenario, sdc shows 100% utilization (%util), indicating that it has reached its maximum capacity for read/write throughput and IOPS. This disk is a clear bottleneck and is likely causing performance issues.

The Value of `15` in `iostat -x 15`

The 15 in the command iostat -x 15 means the command will report the average statistics every 15 seconds. This helps in monitoring performance over time and identifying spikes in I/O activity.

Checking Swap Usage and Its Impact on I/O

Sometimes system administrators notice high swap usage and assume it’s a sign of I/O performance issues. However, this isn’t always the case — especially on Linux, where the kernel might preemptively swap out idle memory pages even when there is plenty of free RAM.

You can check memory and swap usage using:

free -m

Example output:

              total        used        free      shared  buff/cache   available
Mem:         402370      271356      111370        2882       19643      124685
Swap:          8191        8105          86

Here, even though swap usage is very high (8105 MB out of 8191 MB), over 110 GB of RAM is still free, and over 124 GB is available. This might look concerning at first, but let’s confirm if it’s actually affecting performance.

Use the vmstat command to observe swap activity over time:

vmstat 1 5

Sample output:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0 8205960 113117808 2445388 17811172    0    1  5357   790    1    1 13  4 83  1  0
 1  0 8205960 113101784 2445396 17811168    0    0   344   309 18896 13551  5  3 92  0  0
 1  0 8205960 113108472 2445396 17811192    0    0  2112  2538 15889 11272  9  2 88  0  0
 1  0 8205960 113093040 2445396 17811224    0    0  1440   644 20056 14599 10  3 87  0  0
11  0 8205960 113096736 2445412 17811376    0    0   184  1191 33471 22185 20 13 67  0  0

Key indicators:

si (swap in) and so (swap out) are all zero, meaning no active swapping is happening.
CPU wait time (wa) is near zero, indicating no I/O bottlenecks.
I/O columns (bi, bo) are stable and not peaking.

Despite high swap usage, there is no I/O impact here. The kernel likely swapped out idle memory pages (e.g., from Oracle DB background processes) earlier, and since they’re not being accessed, they remain in swap.

When Should You Be Concerned About Swap Usage?

Swap becomes a concern only if:

You see non-zero si and so values consistently in vmstat.
There’s increased CPU wait time (wa).
High disk I/O due to swap (bi, bo spikes).
Applications are slowing down or crashing due to memory pressure.

In those cases, you can:

Reduce vm.swappiness (e.g., set it to 10) using sysctl.
Reclaim swap usage with swapoff -a && swapon -a (during low activity).
Consider upgrading RAM if usage is always near full.

This kind of analysis complements disk-level tools like iostat, giving you a holistic view of system health without misinterpreting memory usage signals.

Conclusion

By analyzing iostat -x output, you can identify specific disks with performance issues. High r_await, w_await, and %util values are indicators of potential I/O bottlenecks. Address these by optimizing disk usage, upgrading hardware, or redistributing the load across multiple disks.

For further insights on CPU performance and related I/O issues, visit TechInfoBest.

Analyzing I/O Performance in Linux