AIX NIM Server Tuning - Part 2
Following on from Part 1 of the AIX NIM Server Tuning, here is part 2. In this part we will cover the AIX kernel scheduling tuning and JFS2 mount options.
CPU Default Scheduling.
We noticed in the vmstat outputs that there was a significant amount of idle CPU time and that the LPAR was consuming just over 2 cores of physical CPU resources even though the LPAR was configured with 3 Virtual Processors.
You need to be aware that the idle column in the vmstat output is the percentage of idle time of the pc column (Physical Cores Consumed). For example, if the idle column shows 10 and the pc column shows 2.20, then the LPAR had 0.22 cores of idle CPU.
The mpstat output showed that the LPAR was generally running threads on the primary and secondary Virtual Processor threads when under load. In the mpstat -w output below, which shows the statistics for each Logical Processor since boot time, we can see higher values for the primary threads followed by the secondary threads. The primary threads are rarely idle (2 and 3%), the secondary threads are 50% idle and the 3rd and 4th threads per Logical Processor are hardly ever used at over 90% idle.
# mpstat -w
System configuration: lcpu=12 ent=0.4 mode=Uncapped
cpu |
min |
maj |
cs |
ics |
us |
sy |
wa |
id |
0 |
145,603,641,356 |
571,568,742 |
6,905,336,411 |
2,220,354,598 |
41 |
57 |
0 |
2 |
1 |
7,655,687,977 |
35,430,147 |
1,141,677,138 |
587,266,848 |
20 |
15 |
1 |
64 |
2 |
233,750,378 |
1,106,131 |
113,391,706 |
34,562,929 |
2 |
3 |
- |
95 |
3 |
213,422,093 |
1,027,726 |
111,350,684 |
32,997,193 |
2 |
3 |
- |
95 |
4 |
92,353,582,200 |
343,388,483 |
2,672,050,462 |
609,263,431 |
43 |
53 |
1 |
3 |
5 |
4,539,093,429 |
51,839,487 |
591,113,394 |
145,824,190 |
29 |
20 |
1 |
50 |
6 |
278,943,511 |
2,426,417 |
111,583,439 |
24,535,832 |
5 |
6 |
0 |
89 |
7 |
248,242,757 |
2,108,107 |
104,836,790 |
22,772,560 |
4 |
6 |
0 |
90 |
8 |
84,280,471,033 |
77,712,396 |
1,514,576,122 |
568,460,299 |
47 |
51 |
0 |
2 |
9 |
5,189,339,612 |
19,881,068 |
614,246,289 |
158,468,986 |
28 |
20 |
1 |
51 |
10 |
217,826,048 |
1,024,729 |
106,114,466 |
26,471,167 |
5 |
4 |
0 |
91 |
11 |
201,046,831 |
970,238 |
104,985,007 |
25,820,451 |
4 |
4 |
0 |
91 |
ALL |
341,015,047,225 |
1,108,483,671 |
14,091,261,908 |
4,456,798,484 |
30 |
36 |
0 |
34 |
As we had multiple threads running on the CPUs at the same time, we decided to tell AIX to exploit the primary and secondary SMT threads on each Virtual Processor before unfolding the next Virtual Processor. This is done with the vpm_throughput_mode option in schedo.
schedo -po vpm_throughput_mode = "2"
Results after Initial Tuning.
After the initial tuning we had a NIM server that was running a lot more efficiently. The LPAR was doing more file IO than before and we had plenty of free memory pages available for the application and IOs. As we were doing more IO, you will notice that the lrud process was scanning and freeing more memory pages than before. We were consuming slightly more cores (pc) but also had higher values in the idle (id) column.
--------------- --------------------- ------------------------------------ ------------------ ----------------------- --------
r b p w avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se
4 0 0 0 1073705 13690 383 24641 0 0 20182 20285 7468 76347 15619 54 29 17 0 2.90 725.9 02:06:32
3 0 0 0 1073492 14349 130 20865 0 0 21575 21885 7248 73100 16732 50 30 20 0 2.65 663.3 02:06:33
8 0 0 0 1073492 14262 128 20672 0 0 20467 20736 7601 71825 16606 48 32 20 0 2.64 660.6 02:06:34
2 0 0 0 1073919 17662 255 16354 0 0 20473 20678 6948 66780 14732 43 34 24 0 2.47 617.1 02:06:35
4 0 0 0 1073522 16475 256 16642 0 0 15259 15487 7028 69599 15358 44 33 23 0 2.43 607.5 02:06:36
3 0 0 0 1073711 19002 128 17789 0 0 20472 20548 7410 70395 16235 44 33 22 0 2.45 613.7 02:06:37
5 0 0 0 1073711 14163 256 19934 0 0 15342 15394 8045 66081 15419 45 32 23 0 2.56 640.8 02:06:38
6 0 0 0 1073917 13277 286 15787 0 0 15356 25049 6991 69595 15081 42 34 24 0 2.44 609.6 02:06:39
2 0 0 0 1073702 13103 128 15520 0 0 15357 15433 6682 59849 13599 41 33 25 0 2.37 593.4 02:06:40
4 1 0 0 1073702 17186 127 16383 0 0 20330 20650 7777 61129 15004 40 37 24 0 2.47 618.3 02:06:41
0 0 0 0 1073920 10535 383 21379 0 0 15323 15414 7572 73314 16874 47 33 20 0 2.71 677.7 02:06:42
11 0 0 0 1073909 12555 128 20928 0 0 23088 23171 7943 72006 16696 48 32 20 0 2.72 679.1 02:06:43
5 0 0 0 1073667 12837 255 22614 0 0 22938 23176 7481 74535 16348 48 32 19 0 2.77 692.2 02:06:44
6 0 0 0 1073667 18269 127 20122 0 0 25589 25880 6917 64947 14676 46 34 21 0 2.60 651.0 02:06:45
7 0 0 0 1073884 18786 0 19732 0 0 20461 20583 7370 73361 15212 47 32 21 0 2.62 655.3 02:06:46
4 0 0 0 1073676 15665 255 18424 0 0 15334 15551 7122 68738 14225 44 32 24 0 2.56 639.6 02:06:47
4 0 0 0 1073680 10574 255 20247 0 0 15342 15433 7348 70591 15388 48 32 21 0 2.62 655.5 02:06:48
5 0 0 0 1073680 11361 127 20408 0 0 21597 21742 7787 70026 16548 46 32 22 0 2.66 665.2 02:06:49
5 0 0 0 1073665 12585 256 23198 0 0 24408 24807 7737 76005 15311 51 31 18 0 2.82 703.8 02:06:50
6 0 0 0 1073907 10970 127 21787 0 0 20453 20562 7523 76093 16650 49 33 19 0 2.72 681.1 02:06:51
JFS and JFS2 Mount Options
JFS and JFS2 file-systems have a couple of rarely used mount options that can significantly reduce the amount of work that the LRUD process needs to do when scanning for and freeing memory pages. These are the release behind write (rbw) and release behind read (rbr) mount options. These options tell the logical volume manager to release the file-cache pages used by the read or write back on to the free list once the IO has either been passed to the application (for reads) or committed to disk (for writes) by the storage array only when the files are being read or written in a sequential manor.
So for our NIM server, the average size of the mksysb images been written to the disk and then backed up by TSM, were larger than the amount of memory in the LPAR. Each of these files would be written once to disk and the read once by the TSM backup process. The mksysb images were written overnight, mostly between 10pm and 6am, and the TSM backup would run late in the afternoon around 4pm. So there was little point in keeping these file pages in the file cache after the IO had completed.
So we chose to mount the /export/mksysb filesystem with the rbrw mount options.
We had a number of other file-systems on the NIM server that were used to store nmon log files from every LPAR, the current nmon log files would be written to the nmon_data mount and stored for 4 days and there is an archive directory (nmon_archive) of key nmon files that are kept for 365 days. Most of these log files are written once and never re-written or read (except for the TSM backup).
So we chose to mount both these file-systems with the rbrw mount option.
LPAR Virtual Processors.
We noticed that the idle CPU time was around 20 to 25% of the average 2.50 cores (pc) consumed. This showed that the LPAR was generally using around 2 physical cores of CPU resource but was spreading the workload over the 3 virtual processors that the LPAR was assigned. If we reduce the number of VPs to 2, and increase the Entitlement to 2, the LPAR should use both of the VPs it has to full capacity, and utilise the SMT2 and SMT4 hardware threads.
Results after Tuning.
After all the tuning had been done, our NIM server was running the same workload on 1 less virtual processor with an increase in entitlement. You will see in the vmstat output below, that we are using the 2 cores we have almost 100% of the time with little or no idle cycles. Therefore the L2 and L3 cache on these cores are more likely to have our data in them, hence allowing more efficient use of the cores and cache. We have little or no LRUD activity which frees up the CPU cycles for our application and network processes to use. The Free Memory list is always 6 digits allowing more efficient disk IO.
kthr memory page faults cpu time
--------------- --------------------- ------------------------------------ ------------------ ----------------------- --------
r b p w avm fre fi fo pi po fr sr in sy cs us sy id wa pc ec hr mi se
7 0 0 0 995505 326966 0 17678 0 0 0 0 7112 68828 14431 51 35 13 0 1.97 98.6 02:06:25
8 0 0 0 995552 327050 0 17568 0 0 0 0 7279 72036 14424 52 36 12 0 1.98 99.0 02:06:26
8 0 0 0 995336 327061 0 16159 0 0 0 0 6859 73393 13708 50 39 10 0 1.98 99.1 02:06:27
8 0 0 0 995090 325747 0 14239 0 0 0 0 6206 60214 13307 42 48 10 0 1.99 99.6 02:06:28
0 0 0 0 995027 327540 0 15871 0 0 0 0 5516 58712 10524 44 49 7 0 1.98 99.0 02:06:29
13 0 0 0 995649 324925 0 13084 0 0 0 0 4924 66620 9015 45 54 2 0 2.00 100.0 02:06:30
12 0 0 0 995544 322922 0 17198 0 0 0 0 6112 70740 10590 51 47 2 0 2.00 100.0 02:06:31
9 0 0 0 995476 323617 0 18129 0 0 0 0 5902 65764 9531 52 46 2 0 2.00 100.0 02:06:32
9 0 0 0 995098 319576 0 14784 0 0 0 0 6020 65647 11772 50 44 5 0 2.00 100.1 02:06:33
5 0 0 0 995569 317059 0 15721 0 0 0 0 5633 63204 10362 49 46 5 0 2.00 99.9 02:06:34
11 0 0 0 995508 323408 0 23193 0 0 0 0 5509 62857 9210 49 48 2 0 2.00 100.0 02:06:35
9 0 0 0 995513 325071 0 20511 0 0 0 0 5854 65591 10258 51 47 2 0 2.00 100.0 02:06:36
10 0 0 0 995588 326988 0 18047 0 0 0 0 5746 61239 9468 50 46 4 0 2.00 100.1 02:06:37
7 0 0 0 995098 327361 0 18239 0 0 0 0 6684 64039 12467 46 48 6 0 2.00 99.9 02:06:38
6 0 0 0 995506 327068 0 18719 0 0 0 0 7270 65805 14349 50 40 10 0 2.00 99.9 02:06:39
8 0 0 0 995508 327010 0 15936 0 0 0 0 6417 69761 12911 48 42 10 0 1.99 99.3 02:06:40
9 0 0 0 996080 326363 0 17671 0 0 0 0 6728 71183 13166 53 40 8 0 1.99 99.6 02:06:41
8 0 0 0 995130 327364 0 19832 0 0 0 0 7755 70283 13429 54 38 8 0 2.00 99.8 02:06:42
10 0 0 0 995130 327393 0 19890 0 0 0 0 7269 66524 12878 52 40 8 0 1.98 99.1 02:06:43
6 0 0 0 995538 326867 0 20026 0 0 0 0 6539 68116 11951 55 40 5 0 2.00 99.9 02:06:44