AIX - What is IO Wait?
In AIX, what exactly is IO wait as reported by vmstat in the wa column?
AIX 4.3.2 and earlier.
At each clock interrupt on each processor (100 times a second per processor), a determination is made as to which of the four categories (usr/sys/wio/idle) to place the last 10 ms of time. If the CPU was busy in usr mode at the time of the clock interrupt, then usr gets the clock tick added into its category. If the CPU was busy in kernel mode at the time of the clock interrupt, then the sys category gets the tick. If the CPU was not busy, a check is made to see if any I/O to disk is in progress (from any CPU). If any disk I/O is in progress, the wio category is incremented. If no disk I/O is in progress and the CPU is not busy, the idle category gets the tick
AIX 4.3.3 and later.
The change in AIX 4.3.3 is to only mark an idle CPU as wio if an outstanding I/O was started on that CPU. This method can report much lower wio times when just a few threads are doing I/O and the system is otherwise idle. For example, a system with four CPUs and one thread doing I/O will report a maximum of 25 percent wio time. A system with 12 CPUs and one thread doing I/O will report a maximum of 8.3 percent wio time. IO to and from NFS mounted file systems are reported as wait I/O time.
In summary, IO wait is the percentage of time a processor is idle but has at least 1 (one) outstanding IO request.
High IO wait does not automatically mean you have a disk bottleneck - You can see IO wait during your backup window when the application has stopped for the backup.
Low IO wait does not automatically mean you do not have a disk bottleneck. You application may be busy processing other requests while IOs are taking a long time to complete.
In conclusion, IO wait can be a misleading indicator of disk performance. A much better metric to look at would be IO service times (avgserv / minserv / maxserv) and the IO queue service times (avgwqsz / avgsqsz / sqfull) reported by the 'iostat -DR' command.