![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1952478.1 : Pillar Axiom: Pilot Control Unit Replacement Fails if Staged Software Version is Higher than Installed Software Version
Pilot Control Unit replacement on R5.x systems will fail if the staged software version is higher than the installed software version. The cause is an architectural issue solved only in R6. In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:Pillar Axiom 600 Storage System - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. SymptomsOn a Pilot Control Unit replacement, where the replacement CU is imaged to Release 5.4, the replacement pilot may reboot continuously, typically 20-30 seconds after the root prompt appears. Verifying this Staged Software Issue as CauseThe symptom for this mismatched software issue is:
ChangesReplacement Pilots for all Ax600 systems at 5.2.0 or higher should be 1450-00268-32 or 1450-00314-00. 1450-00268-32 is an A1811E Pilot imaged to release 5.4.0 and 1450-00314-00 is an A1811 Piloted imaged to 5.4.0. If the system was upgraded from R4 to R5, 1450-00314-00 A1811E Pilot can be used for replacement. It is not recommended to replace A1811E 1450-0268-32 Pilot CU with 1450-00314-00 A1811 Pilot even though it is a valid replacement. Replacing A1811E 1450-0268-32 with 1450-00314-00 A1811 Pilot will result in the depreciation of hardware configuration of the Pilot and may not go well with the customer who will be expecting faster A1811E Pilot. Release 5.4.0 modified the method used for Pilot CU replacement to resolve multiple Severity 1 Bugs in lower releases. The replacement Pilot CU will check the Installed Software version on the surviving Pilot CU using an rpm database query, then the replacement CU will attempt to scp the RPMs for that installed software from the surviving Pilot. There is an architectural flaw in this design, because the R5.x pilots do not store the Installed software, they only store the Staged software, in /var/unpacked. The /var/installed directory only contains files used for replacing Slammer or Brick components. If the R5 system is at 5.3.x release or below, do not replace failed Pilot with either 1450-00268-32 A1811E Pilot or 1450-00314-00 A1811 Pilot as this can cause the entire system to go down. A double Pilot replacement will be required on R5 version if the system is below 5.4.x release. CauseReference Bug 20109721 As the replacement Pilot boots, it will check its installed RPM versions and compare them to the ones on the surviving Pilot CU. If the versions do not match, the replacement Pilot CU will attempt to copy the main RPMs from /var/unpacked on the surviving Pilot. If a higher software version has been staged, the necessary RPMs will no longer be in /var/unpacked on the surviving Pilot CU. The replacement pilot will log the update failure and reboot to retry, but will not be able to retrieve the correct package versions. The /var/log/sw_ec.log** on the REPLACEMENT Pilot CU will have entries indicating it is trying to update by copying the RPMs from the buddy, but cannot find the packages: Nov 27 19:29:30.658 localhost SW_EC: ec:info The versions do not compare. SolutionYou will need the Software Package RPM for the Installed Release. To stop the repeated Pilot CU reboot on the replacement Pilot, you must terminate the pilotcfg service on the new Pilot CU. You may be able to do this with an SSH command from the surviving Pilot CU by pinging the replacement, then issuing the command to stop the pilotcfg service as soon as you see a ping response. If that does not work, you may need to create a small shell script on the surviving Pilot CU to stop the service on the replacement CU: Go to /var/tmp on the surviving Pilot CU, and create a small shell script named "stopit" The example shown assumes that the replacement CU is pilot1. If the replacement CU is pilot2, change the pilot name. while true [ true ]; do
ssh pilot1 -o ConnectTimeout=1 service pilotcfg stop sleep 1 done Make the script executable with "chmod 777 stopit", then run the script with ./stopit. If the script is able to shut down pilotcfg on the replacement pilot, you will see the pcp_monitor and pilotcfg shutting down: [root@pilot2 tmp]# ./stopit Stop the script with CTL-C, and verify that the reboots on the replacement CU have stopped. Upload and extract the RPM for the Installed Release on the surviving Pilot CU:
This should allow the replacment Pilot CU to retrieve the correct version of files and complete its software sync. DO NOT ATTEMPT TO TEST THE REPLACEMENT PILOT FOR AT LEAST 1 HOUR!!!
References<NOTE:1609199.1> - Pillar Axiom: How to Perform a Double Pilot Replacement in R5<NOTE:1539019.1> - Pillar Axiom: How to Replace a Pilot Control Unit:ATR:1539019.1:2 <NOTE:1577565.1> - Pillar Axiom: Axiom 300, 500, 600 Pilot CU Substitution Matrix Attachments This solution has no attachment |
||||||||||||||||||||||
|