I was recently tasked with solving a tricky disk performance issue on a Linux server. At first, I thought it would be a straightforward task. After all, I had been working in IT for several years and had seen my fair share of disk performance issues.
But as soon as I started looking into the issue, I quickly realized that this wasn’t going to be an easy fix. There are 5 disks on that Linux system and the disk performance seemed to vary from time to time.
I started by trying some of the more obvious solutions — checking disk space, checking hardware status, listing partitions, checking RAID status, etc. But none of those seemed to be having any issues.
At this point, I was beginning to lose hope. But then I remembered something my mentor had said when I first started in IT: “When all else fails, go back to basics.”
I decided to start from the very beginning. I checked the physical connections of the disks, making sure all cables were securely connected and in the right places.
Once that was done, I followed the steps here to check disk performance. I began running a monitor with the iostat command on each disk to see if I could identify any areas where performance might be suffering. After a few hours, I found one disk that had abnormally high latency.
After carefully examining the disk, I was able to pinpoint a specific area where data transfers were taking much longer than usual. After further investigation, I realized that this issue was caused by a high workload on the disk.
The IOPS for that disk was increased by 3 times and disk utilization was 100% during the issue time.
Then I used the iotop command to check which process generated so much workload on this disk. It was a process related to Database.
I checked this with the Database team. It turned out that they added a SQL recently which trigged the full table scan on the database side.
They fixed this sql by adding an index on the DB side. After that, the latency dropped significantly and the overall performance of the disk improved drastically. Problem solved!
It took me a few hours to fix the issue, but in the end it was worth it. I learned an important lesson that day: Sometimes, if you take the time to go back to basics and examine things from a different angle, it can make all the difference.
Now, I can proudly say that I have a deep understanding of disk performance in Linux and the skill set to tackle any issue that might arise. And it all started with that one disk performance issue.
The moral of this story is simple: don’t give up. With the right knowledge and approach, any problem can be solved, even those that seem impossible at first. This was true for me when I faced my disk performance issue in Linux — and it can be true for you, too. Good luck!
Top comments (2)
Cool. It helps me a lot. Do you know how to troubleshoot a iowait issue in Linux? The iowait is really high in top command.
I am struggling this for days.
Glad to see it helped.
The iowait column on top command output shows the percentage of time that the processor was waiting for I/O to complete. It indicates that the system is waiting on disk or network IO. Because the system is waiting on those resources, it can not fully utilize the CPU.
Check this post to see how to fix iowait issue. Here is a troubleshooting guide about iowait issue.
Hope you can get more ideas.