After much performance monitoring with sar, and associating the timing of the freezes to sar data, we determined that I/O wait seemed to be the culprit. Real memory usage {used - (cached+buffered)} only ever peaked around 12G, and CPUs were largely idle.
Disk is 10k rpm 300GB SAS 2.5" drive.. in RAID 1 (mirrored) using HS21 onboard LSI controller.
I/O Wait is primarily disk I/O but could be network I/O as well, so we decided to tune disk and network.
Filesystems is using ext3 and mounted with noatime.
Finding this link , I tried adding the mount options: noatime,nobh,data=writeback,commit=90
NOTE: If on root filesystem, you must add "rootflags=nobh,data=writeback,commit=90" in grub.
After a reboot, and once full load was back on system, I realized a substantial reduction in average IO wait time (from ~50ms to ~25ms), and overall average % disk utilization (from ~15% to ~8% - statistics from iostat).
This was great improvement but there were still slight 1-2s freezing incidents at this point, so I additionally tweaked the network...
For network performance tuning I ran across this link and added the following to /etc/sysctl.conf:
net.ipv4.tcp_tw_reuse=1
net.ipv4.tcp_fin_timeout=30
net.ipv4.tcp_keepalive_time=1800
net.core.wmem_max=8388608
net.core.rmem_max=8388608
net.ipv4.tcp_rmem="4096 87380 8388608"
net.ipv4.tcp_wmem="4096 87380 8388608"
Together, these have stopped the "freezing" altogether. There are still periods of slowness due to load, but now they are few and far between.
No comments:
Post a Comment