Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1952478.1
Update Date:2018-02-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1952478.1 :   Pillar Axiom: Pilot Control Unit Replacement Fails if Staged Software Version is Higher than Installed Software Version  


Related Items
  • Pillar Axiom 600 Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Axiom>SN-DK: Ax600
  •  


Pilot Control Unit replacement on R5.x systems will fail if the staged software version is higher than the installed software version. 
The cause is an architectural issue solved only in R6.

In this Document
Symptoms
 Verifying this Staged Software Issue as Cause
Changes
Cause
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Requires Pilot OS shell access to resolve.

Applies to:

Pillar Axiom 600 Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

On a Pilot Control Unit replacement, where the replacement CU is imaged to Release 5.4, the replacement pilot may reboot continuously, typically 20-30 seconds after the root prompt appears. 

Verifying this Staged Software Issue as Cause

The symptom for this mismatched software issue is:

  • The replacement pilot will boot, stay up for approximately 20-30 seconds, then reboot.   It will repeat this behavior.   There are other causes of this behavior not covered in this document.
  • The Staged Software and Compatibility Matrix version displayed in the GUI will be higher than the Installed Software.
  • Pilot logs show failed attempts to sync software versions. 

Changes

Replacement Pilots for all Ax600 systems at 5.2.0 or higher should be 1450-00268-32 or 1450-00314-00. 1450-00268-32 is an A1811E Pilot imaged to release 5.4.0 and 1450-00314-00 is an A1811 Piloted imaged to 5.4.0.

If the system was upgraded from R4 to R5, 1450-00314-00 A1811E Pilot can be used for replacement. It is not recommended to replace A1811E 1450-0268-32 Pilot CU with 1450-00314-00 A1811 Pilot even though it is a valid replacement.

Replacing A1811E 1450-0268-32 with 1450-00314-00 A1811 Pilot will result in the depreciation of hardware configuration of the Pilot and may not go well with the customer who will be expecting faster A1811E Pilot.

Release 5.4.0 modified the method used for Pilot CU replacement to resolve multiple Severity 1 Bugs in lower releases. The replacement Pilot CU will check the Installed Software version on the surviving Pilot CU using an rpm database query, then the replacement CU will attempt to scp the RPMs for that installed software from the surviving Pilot.

There is an architectural flaw in this design, because the R5.x pilots do not store the Installed software, they only store the Staged software, in /var/unpacked. The /var/installed directory only contains files used for replacing Slammer or Brick components.

If the R5 system is at 5.3.x release or below, do not replace failed Pilot with either 1450-00268-32 A1811E Pilot or 1450-00314-00 A1811 Pilot as this can cause the entire system to go down. A double Pilot replacement will be required on R5 version if the system is below 5.4.x release.

Cause

Reference Bug 20109721

As the replacement Pilot boots, it will check its installed RPM versions and compare them to the ones on the surviving Pilot CU.  If the versions do not match, the replacement Pilot CU will attempt to copy the main RPMs from /var/unpacked on the surviving Pilot.   If a higher software version has been staged, the necessary RPMs will no longer be in /var/unpacked on the surviving Pilot CU.  The replacement pilot will log the update failure and reboot to retry, but will not be able to retrieve the correct package versions. 

The /var/log/sw_ec.log** on the REPLACEMENT Pilot CU will have entries indicating it is trying to update by copying the RPMs from the buddy,  but cannot find the packages:

Nov 27 19:29:30.658 localhost SW_EC: ec:info The versions do not compare.  
Starting update...
Nov 27 19:29:30.660 localhost SW_EC: ec:info Retrieving RPMs from buddy:
172.30.80.2 ...
Nov 27 19:29:30.662 localhost SW_EC: ec:info scp -q
172.30.80.2:/var/unpacked/*.xml /var/unpacked/.
Nov 27 19:29:30.775 localhost SW_EC: ec:info scp -q
172.30.80.2:/var/unpacked/pillar-raid-072318-072318.i386.rpm /var/unpacked/.
Nov 27 19:29:30.900 localhost SW_EC: ec:info Retrieving
172.30.80.2:/var/unpacked/pillar-raid-072318-072318.i386.rpm failed: 256

 

Solution

You will need the Software Package RPM for the Installed Release. 

To stop the repeated Pilot CU reboot on the replacement Pilot, you must terminate the pilotcfg service on the new Pilot CU.   You may be able to do this with an SSH command from the surviving Pilot CU by pinging the replacement, then issuing the command to stop the pilotcfg service as soon as you see a ping response. 

If that does not work, you may need to create a small shell script on the surviving Pilot CU to stop the service on the replacement CU:

Go to /var/tmp on the surviving Pilot CU, and create a small shell script named "stopit"   The example shown assumes that the replacement CU is pilot1.   If the replacement CU is pilot2, change the pilot name. 

while true [ true ]; do
        ssh pilot1 -o ConnectTimeout=1 service pilotcfg stop
        sleep 1
done

Make the script executable with "chmod 777 stopit", then run the script with ./stopit. 

If the script is able to shut down pilotcfg on the replacement pilot, you will see the pcp_monitor and pilotcfg shutting down:

[root@pilot2 tmp]# ./stopit
Shutting down pcp_monitor:
Shutting down pilotcfg:

Stop the script with CTL-C, and verify that the reboots on the replacement CU have stopped.  

Upload and extract the RPM for the Installed Release on the surviving Pilot CU: 

  1. On the surviving Pilot CU, cd /var/tmp and mkdir PKG
  2. Using scp or WinSCP, copy the rpm to /var/tmp/PKG:   scp AxiomONE-SW-050405-009400.i386.rpm root@yourAxiom_sharedIP:/var/tmp/PKG
  3. cd /var/tmp/PKG and extract the main RPM with rpm2cpio AxiomONE-SW-050405-009400.i386.rpm | cpio -idmv
    $ rpm2cpio AxiomONE-SW-050405-009400.i386.rpm | cpio -idmv
    ./var/unpacked
    ./var/unpacked/Compatibility.xml
    ./var/unpacked/pillar-fcraid-012318-012318.i386.rpm
    ./var/unpacked/pillar-fcraid2-022318-022318.i386.rpm
    ./var/unpacked/pillar-pilot-apps-050405-009400.i386.rpm
    ./var/unpacked/pillar-pilot-os-050400-003900.i386.rpm
    ./var/unpacked/pillar-raid-072318-072318.i386.rpm
    ./var/unpacked/pillar-raid2-002318-002318.i386.rpm
    ./var/unpacked/pillar-slammer-ax300-050405-009400.i386.rpm
    ./var/unpacked/pillar-slammer-ax500-050405-009400.i386.rpm
    ./var/unpacked/pillar-slammer-ax600-050405-009400.i386.rpm
    ./var/unpacked/pillar-slammer-boot-030000-015001.i386.rpm
    ./var/unpacked/pillar-slammer-boot-ax600-050000-040000.i386.rpm
  4. cd /var/unpacked and remove all existing RPM files with rm -f *.rpm
  5. cd /var/tmp/PKG/var/unpacked/ and copy all of the RPMs to /var/unpacked with cp *.rpm /var/unpacked
  6. cd /var/unpacked and change the file permissions with chmod 644 *.rpm 
  7. Reboot the replacement Pilot CU with "ssh pilot1 reboot &" and then press enter to get the command prompt again. 
    (If the replacement is pilot2, change the pilot name in the above command to pilot2)

This should allow the replacment Pilot CU to retrieve the correct version of files and complete its software sync.

DO NOT ATTEMPT TO TEST THE REPLACEMENT PILOT FOR AT LEAST 1 HOUR!!!  
A full pilot to pilot sync is required in order to prepare the replacement properly.  This may take up to an hour. 

 

 

References

<NOTE:1609199.1> - Pillar Axiom: How to Perform a Double Pilot Replacement in R5
<NOTE:1539019.1> - Pillar Axiom: How to Replace a Pilot Control Unit:ATR:1539019.1:2
<NOTE:1577565.1> - Pillar Axiom: Axiom 300, 500, 600 Pilot CU Substitution Matrix

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback