PowerVM LPM with a Dead VIO Server
The event we are simulating means that we need to do hardware maintenance on the Frame and we would need to LPM all the active LPARs to another frame to avoid any outages, the problem was that the hardware issue caused one of my VIO Servers to die.
The question was how an LPM when one VIO Servers on the source frame is down?
The answer is the allow_inactive_source_storage_vios=1
setting.
Here are the details.
VIO Server Version: 2.2.5.10.
HMC Version is: V8R8.6.0.1
Firmware Version is: FW860.20 (SC860_082).
After shutting down the VIO Server, I attempted to do the LPM (migrlpar), but this failed with messages like these:
HSCLA246 The management console cannot communicate with partition vio41. This may be because the network is not available, the partition does not have a level of software that is capable of supporting this operation, or the RMC connection to the partition has not been established. Verify the network setup on the management console and the partition and that an RMC connection between the management console and the partition has been established, and try the operation again.
HSCLA298 Virtual I/O Server partition vio41 (2*8286-42A) does not support partition mobility.
HSCL400A There was a problem running the VIOS command.
HSCLA29A The RMC command issued to partition vio41 failed.
Testing the return code confirmed that the LPM (migrlpar) command had failed.
> echo $?
1
The fix was to run this command from the HMC:
migrlpar -r sys -m {framename} -o set -i allow_inactive_source_storage_vios=1
So I ran the LPM command (migrlpar) again and this time it worked.
I still saw a number of warning messages like these:
HSCL400A There was a problem running the VIOS command.
HSCLA29A The RMC command issued to partition vio41 failed.
2610-652 The command execution has exceeded the allowable time limit (60 seconds) has been exceeded.
Testing the Return Code confirmed that the LPM was successful.
> echo $?
0
This was further confirmed by the running LPAR now being on the target frame.
I then re-started the failed VIO Server. I noticed that none of the VTD devices nor the vFCS devices had been removed.
These devices were left in the Defined state, so you should ideally delete them all before you migrate any LPARs back.
SVSA Physloc Client Partition ID
--------------- -------------------------------------------- ------------------
vhost7 U8286.42A.V2-C1490 0x00000000
VTD t102-rvg1
Status Defined
LUN 0x8100000000000000
Backing device hdisk10
Physloc U78C9.001.P1-C3-T1-W500000590-L9000000000000
Mirrored false