When less is more - Less VPs More Performance
I recently tuned an AIX LPAR running on a p7-780 frame and the end result from the tuning showed that Less Virtual Processors and More Entitlement can equal Better Performance.
The LPAR configuration before tuning:
12 Virtual Processors (VPs) and 6 Entitlement with CPU Folding Enabled (vpm_fold_policy=1)
Note that although this LPAR has 12 Virtual Processors, both mpstat and PowerVP showed that the LPAR was never running off the socket (chip), so at most it was only ever using 8 Physical Cores in the p7 server.
The LPAR configuration after tuning:
8 Virtual Processors (VPs) and 8 Entitlement with CPU Folding Disabled (vpm_fold_policy=4)
Note that the effect of these settings almost always guareentees that the LPAR will be scheduled back to its home cores in the p780 server and no other LPARs should be able to kick this LPAR off the cores when it is running as it will always be running within Entitlement.
The two graphs shown below are for the same LPAR, for the same time frame on the same day of the week. The first graph was produced from vmstat data before tuning and the second graph from data after the tuning.
Before the tuning, the LPAR was using around 5 to 6 Virutal Processors, with some large spikes to 8 or more Virtual Processors. After the tuning, the LPAR was mostly using around 4 or 5 Virtual Processors with two smaller spikes to 8 Virtual Processors briefly.
What surprised me the most was the fact that the amount of user and system CPU time actually dropped from an average of almost 3 Virtual Processors to under 2 Virtual Processors, the same amount of data was being processes more efficiently with less cores.
System time was averaging around 19% before the tuning and dropped to around 13% after the tuning. One of the major factors (in my opinion) is the CPU system time over head in folding and unfolding CPUs, which after the tuning would be a lot less as we no longer fold vCPUs. In the vmstat data I could see that when the PC column rose by a value of 1 or more, the system time would spike by around 8 to 10%, indicating the extra CPU system time needed to fold up a new Virtual Processor.
User time was averaging around 34% berfore tuning and dropped to around 25% after tuning. I can only assume here that user time decreased due to the user code being able to run for longer on the Virtual Processor with better cache efficiency without switching to system (kernel) time so often.
LPAR vmstat output before tuning:
LPAR vmstat output after tuning: