Monday, April 11, 2011

ext4 performance and the barrier option

We recently implemented a 6-disk SAS RAID 1+0 /home ext4 filesystem on our application server to improve performance, and found that we still had a disk I/O bottleneck, with processor I/O wait averaging in the 3 to 4% range.

I came across this LWN article that talks about a 30% performance drop when the barrier option is enabled on an ext4 filesystem.

As of the time of the LWN article, barrier was not enabled by default on an ext4 filesystem.  But somewhere along the kernel release line, barrier=1 (on) became the default for newly created ext4 filesystems.

Doing a "cat /proc/mounts" on my ubuntu 10.04 distro, I found that my ext4 filesystems indeed had the option barrier=1. So on April 3rd, I added barrier=0 to my fstab, and rebooted.  The performance boost was quite dramatic.

Using measures of "average wait time", "% utilization" and "%IO wait", here is the before and after stats, measured and averaged on a typical work week 8am to 5pm...

With barrier=1:

Average wait time:   90 ms
Average Disk utilization %:   40%
Average CPU IO wait %:  3.5%

With barrier=0:

Average wait time:  2 ms
Average Disk utilization %:   2%
Average CPU IO wait %:  0.40%

Graphs depict the change more dramatically (comparing April 1st to April 4th):

Average Wait Time

Average % Utilization

Load Average

NOTE: This is not a "very busy" April 1st compared to a "very light" April 4th.  Both are with a 30 user load, approx. 2300 processes.  And every other day of the week in the comparison show the same dramatic improvement.

Of course one can argue that we are sacrificing data integrity for performance - but we do have a UPS backed system, and crashes are (thus far) non-existent.