Datastage ETL (Extract Transform & Load) Tuning.
The Scope
The scope of this data migration project was to extract over 20TB of data, including 4 million jpeg images from an Oracle database running on Solaris and transform and load the new data in to several Oracle databases running on AIX LPARs on a p6 p570. The ETL engine was running on an IBM p5 p595 with Hitachi USP V storage and AIX 6.1 TL3 SP1 with VIO Server 2.1. The required run time for this process had to be less than 5 days.
Project Status
When I joined the project team the current ETL migration process was estimated to take in excess of 2 weeks to complete.
The Challenge
Reduce the run time of the ETL data migration process to less than 4 days.
The Investigation
Working closely with the Senior ETL Architect, we sat down and discussed the way the ETL application works and how it utilises the hardware resources it has available to it. We utilised the Power Systems Configuration and Performance Graphs to analysis and baseline an ETL migration run at both the LPAR and Power system level.
The areas we needed to address were:
CPU Allocations for the LPARs and Power System.
Memory Allocations for the LPARs and Power System.
Disk pool utilisation within the Hitachi USP V storage array.
Disk allocations from the Hitachi USP V to the AIX host.
I/O Adapter layout and utilisation.
Disk queue depths on AIX and VIO Servers.
AIX mount options for JFS2 filesystems.
AIX Logical Volume Manager (LVM) and Virtual Memory Manager (VMM) tuning.
AIX filesystem layout and application data segregation.
Application tuning to take full advantage of the above changes.
The Solution
The following changes to the environment were made as a result of the investigations and analysis performed. Only one environment change was made at any time before another test cycle was run. This allowed us to analysis the results of the change to ensure they had a positive impact.
Unused LPARs running of the Power System were decommissioned.
Two non data migration LPARs were moved to another Power System server.
The spare CPU and memory resources were allocated to the data migration LPAR and the Oracle database LPAR.
Create 3 new disk pools within the Hitachi USP V storage array. We created 2 pools of 10TB allocated to the AIX host for the ETL data files and 1 pool of 4TB allocated to the AIX host for the ETL scratch and sort areas.
The 20TB of disk for data files was assigned to the VIO servers as 20x 1TB LUNs that were in turn presented to the host on 2x VSCSI adapters per VIO server. Each VSCSI adapter had 10 LUNs presented down it with load balancing being done via the MPIO path priority.
The 4TB of disk for the scratch and sort areas was presented directly to the host via Fibre Channel (FCS) Adapters and was presented as 16x 256GB LUNs with load balancing being done via the MPIO path priority.
AIX and VIO queue depths were tuned to align with the queue depths configured in the Hitachi USP V storage array. Hitachi USP V’s have a fixed queue depth of 32 for each LUN presented and 2048 for each fibre channel adapter. All AIX and VIO server disks from the Hitachi USP V storage array had a queue depth of 30.
As all the ETL data files were re-creatable in the event of a system crash, so it was decided that we would mount all the JFS2 filesystems with JFS2 journal logging switch off. We also decided to mount the ETL data file filesystems with the release-behind-when-reading and release-behind- when-writing capabilities to avoid filling the AIX disk cache with file data.
The ETL data filesystems were mounted with the ‘-o rbrw,log=NULL’ option.
The ETL scratch and sort filesystems were mounted with the ‘-o log=NULL’ option.
The following AIX VMM tuning options were set:
strict_maxclient = "1"
strict_maxperm = "1"
minperm% = "5"
maxperm% = "16"
maxclient% = "16"
minfree = "1280"
maxfree = "1536"
The following AIX LVM tuning options were set:
numfsbufs = "1024"
numclust = "64"
j2_nBufferPerPagerDevice = "12288"
j2_dynamicBufferPreallocation = "256"