Asset ID: |
1-71-1609199.1 |
Update Date: | 2017-08-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1609199.1
:
Pillar Axiom: How to Perform a Double Pilot Replacement in R5
Related Items |
- Pillar Axiom 600 Storage System
- Pillar Axiom 500 Storage System
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Axiom>SN-DK: Ax600
|
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: To understand how to replace two A1811 Enhanced Pilots
Applies to:
Pillar Axiom 500 Storage System - Version All Versions and later Pillar Axiom 600 Storage System - Version All Versions and later Information in this document applies to any platform.
In some cases customer support will have to replace both Pilot CUs in an Oracle Pillar Axiom Storage System. The cases could be:
- Corruption of pilot images
- Corruption of configuration files
- Corruption on ssh files
- Hardware failures that affect the software image on both pilot CUs.
- Failed Pilot CU replacement
Even if only one Pilot CU has faced corruption on ssh settings or ssh keys it will lead to a double Pilot CU replacement since we cannot verify the integrity of either CU.
Goal
A double pilot CU replacement is disruptive, however data is preserved.
This document ONLY applies to Axioms running on R5.2 or higher, and only the A1811 Enhanced Pilots can be used as replacements.
!!!IMPORTANT!!!
Customer Support should always request assistance from Engineering if there is a question on pilot integrity or corruption. Customer Support should not initiate double pilot replacement without Engineering investigating first.
Solution
-
Checklist of items to complete before implementing the double pilot CU replacement:
- A Service Request should already exist for this activity, if not create one.
- A Field Service task needs to be created. Order two new A1811 Enhanced Pilots part # 1450-00268-xx, and verify the task has been assigned to a Field Engineer who will complete this procedure.
NOTE: This procedure is to replace both Pilot CUs on an R5 Axiom using two of the 1450-00268-xx A1811 Enhanced Pilots, this procedure is only supported if the Axiom is on release 5.2.0 or higher. For more information about Pilot CU substitution please refer to Doc ID 1577565.1
- Make sure the customer is aware that the double pilot replacement procedure is disruptive as it requires ALL SAN hosts be shutdown.
NOTE: Plan on the downtime to take about 1.5hrs or so. 30 minutes to remove the existing pilots and install the new pilots, then up to 1hr to complete the Axiom restart (including upgrading slammers and bricks if necessary).
- Verify the Axiom system serial number which starts with A00... and is viewable in the GUI by navigating to the "Configure" tab -> "Summary" -> "System" -> in the main page "Serial Number:". Or you can get this information from the Service Request owner who can check the logs.
- Verify Axiom network settings, MUST log into the GUI using an account that has administrator privileges, navigate to "Configure" tab -> under "Global Settings" click "Networking". In the main screen right click and choose "Modify Network Settings", the next window will display the current Axiom network settings. Or you can get this information from the Service Request owner who can check the logs.
- The Service Request owner will provide the Axiom R5 RPM software package for the Field Engineer to download (the package can be made available through ARU and downloaded from support.oracle.com).
- Preferred: Provide the current recommended GA release.
- Alternative: Provide the exact release as currently installed. This will work for those customers who cannot upgrade for reasons beyond their control.
- A keyboard and DVI compatible display are required to prepare the new pilots. If a DVI display is not available, a VGA to DVI Adapter may be used. The adapter is available from the FE Handbook under part # 530-3474-01 and others.
The DVI cable is connected to the blue DVI port on the back of the A1811 Enhanced Pilot:
 
NOTE: If a DVI compatible display is not an option, the alternative is to connect a laptop/PC directly to the SVC (service) port on the pilot, the SVC port exist ONLY on the A1811 Enhanced Pilots. The SVC port is an ethernet port located behind the pilot, to the left of the MGMT port on the far right, see image below:  The IP address of the SVC port is 5.0.0.1, and can be used to access the pilot console and the GUI. Use a standard ethernet cable to connect a laptop/PC NIC to the SVC port, configure the host NIC to any 5.0.0.x ip address except 5.0.0.1 (for example 5.0.0.5) and set the gateway of the host NIC to 5.0.0.1. To test connectivity, open a command prompt and ping 5.0.0.1 to verify communication to the port is working. Then use putty (or similar SSH client program) to connect to the SVC port IP 5.0.0.1, if you don't get to a login prompt then SSH will have to be enabled, instructions to enable SSH can be found at Doc ID 1431693.1. Access to the GUI should work on the active pilot.
-
PREPARE THE PILOTS:
NOTE: This ENTIRE section (A, B, and C) SHOULD be done in advance to installing the new pilots, and can be done ANYWHERE. This section can be completed at the customer site, at home, in an office, etc.
NOTE: The two new Pilot CUs MUST NOT be connected to the Slammers as they are prepared for replacement. The new pilots will arrive as "blank" pilots imaged with 5.2.0 or higher with a System Serial Number of 9999999999 and default network settings. This MUST all be changed before connecting the new pilots to the Slammers.
- Manually configure the two new A1811 Enhanced Pilots:
- Remove the two new A1811 Enhanced Pilots from their shipping containers and stack them on top of each other. Do NOT apply power at this point.
- Connect the serial cross-over cable between the two pilots.
NOTE: Make sure to use the bottom most serial port (9 pin DB-9) that is labeled "Serial Port" (just below the port) on both pilots, do NOT use the top serial port.
- Connect an ethernet cable between the ETH0 ports of the two pilots.
- Connect the MGMT port (management port) interface to the network for both pilots (the MGMT port is located far right-side facing the back of the pilot).
- Attach the keyboard and display to the top Pilot CU.
- Both pilots should have NO cables plugged into the ETH1 ports at this point.
NOTE: The ETH1 ports are used to connect the pilots to the slammers, pilots should have NO connection to slammers at this point.
- Apply power to the TOP Pilot CU only.
- Log into the top pilot with user root and password a1s2d3f$
- The root prompt on the top pilot should become "[root@pilot1 root]#" and hostname MUST be "pilot1"
- Use vi to modify the /etc/system_serial_number file to set the System Serial Number to that of the Axiom.
- Modify the /var/lib/pillar/pcp/pilot-config.xml file to set up basic networking. Your values will differ from those in the following example:
<?xml version="1.0" encoding="UTF‐8"?> <pilot_config> <network_common> <netmask>255.255.252.0</netmask> <= Set netmask to local value <gateway>10.79.252.1</gateway> <= Set the gateway to local value <dhcp_enabled>false</dhcp_enabled> <= Disable DHCP <duplex>auto</duplex> <= Set management Ethernet interface to auto‐negotiate <domain/> </network_common> <pilot1> <ip_address>10.79.255.24</ip_address> <= Set to pilot1 explicit IP address </pilot1> <ha_resource> <ip_address>10.79.255.23</ip_address> <= Set to the main shared Axiom IP address </ha_resource> <pilot2> <ip_address>10.79.255.25</ip_address> <= Set to pilot2 explicit IP address </pilot2> <DNS> <nameserver></nameserver> <= You do not need a name server, leave blank. </DNS> <NTP> <server></server> <server></server> <server></server> <use_ntp>false</use_ntp> <= You do not need NTP. Set to false and leave servers blank. </NTP>
NOTE: Since the new pilots maybe connected on a different network than the existing pilots, it may be necessary to use different ip addresses which is OK to do. However, make sure the IPs being used to configure the new pilots are not being used in the customer's network otherwise their will be an ip conflict when the new pilots are installed. Later in this document after the Axiom has been recovered, the ip addresses can be changed in the GUI to match the existing pilot ip addresses.
NOTE: Changes to this file will not become effective until the pilot-config process, becomes active. This will take several minutes after the Pilot OS boots. See additional NOTES below.
- Type "reboot" to reboot pilot1 [pilot2 is still powered off]
- Log in as root and make sure that the hostname is pilot1 and the shell prompt is "[root@pilot1 root]#".
- Type "cat /etc/system_serial_number" and make sure that it matches the Axiom serial number and is NOT 9999999999.
- Type "ifconfig -a" and verify the following:
- Make sure the pilot IP addresses for eth0 and eth0:0 match the pilot1 and ha_resource entered above in Step 11.
NOTE: The Pilot CU must be connected to an external network and the link MUST be up for the shared IP address to appear. The change to these addresses will NOT take place until the pilot becomes active. You can tell if the Pilot CU is active by looking for pmi_if:0 to show 172.30.80.1. This does take a few minutes, so be patient before checking access to the changed IP addresses. They will not change to the new values until the pilot config process makes this the active Pilot CU.
- Verify that the IP address for pmi_if is set to 172.30.80.2, which is the standard internal PMI IP address for pilot1.
- Make sure that the IP address 172.30.80.1 exists for pmi_if:0. This address is the shared address used ONLY by the active Pilot CU.
NOTE: The Pilot CU must be connected to an external network and the link MUST be up for the 172.30.80.1 IP address to appear.
- To enable SSH access for remote verification or assistance:
- Type "service sshd status" which should indicate the openssh‐daemon is running.
- Then cd to /etc/sysconfig/pillar/ folder and type "./open-port.sh 22" to enable SSH on this pilot CU. This command must be repeated each time the pilot cu is rebooted.
- Label the top pilot CU0.
- If all these checks pass, then pilot1 has been successfully replaced and you are ready to start with pilot2. Leave pilot1 powered on.
- Move the keyboard and display to the bottom Pilot CU. Make sure the serial cable and Ethernet cable between the two pilot CUs are securely connected.
- Power on the bottom Pilot CU.
- The Pilot CU negotiation on the serial and ETH0 connections should make this pilot2 and it SHOULD NOT change the serial number file on pilot1 as pilot2 boots. You may see this pilot reboot at least once.
- As soon as the login prompt appears login with user root.
- Make sure that the hostname is pilot2 and the shell prompt is "[root@pilot2 root]#"
- Type "ifconfig -a" and verify the following:
- The IP address for pmi_if should become 172.30.80.3
- There should NOT be a pmi_if:0 address of 172.30.80.1, since pilot1 is still the active pilot.
- After the bottom pilot becomes pilot2, with a pmi_if IP address of 172.30.80.3, proceed to set the system serial number and network configuration.
- Edit the /etc/system_serial_number file to set the Axiom System Serial Number. This must match the pilot1 entry.
- Edit the /var/lib/pillar/pcp/pilot-config.xml file or run the following command: scp root@pilot1:/var/lib/pillar/pcp/pilot-config.xml /var/lib/pillar/pcp/. Make sure the ha_resource, netmask, gateway, pilot1 and pilot2 entries are correct.
- Type "reboot" to reboot pilot2. After it reboots, check the /etc/system_serial_number file, the /var/lib/pillar/pcp/pilot-config.xml file, and then type “ifconfig –a” to make sure that the IP address for eth0 is the external IP address for pilot2.
NOTE: The serial number and networking entries must be checked and corrected if necessary.
- Label the bottom pilot CU1.
- Update the new A1811 Enhanced Pilots using the R5 RPM software package provided by Support:
- Download the Axiom Storage Management GUI software and install it on the desktop. From the desktop open a web-browser and type in the cluster ip address or the ip address of the active pilot in the URL, navigate to "Management Software" tab -> under "Windows Installer (Recommended)" download the msi file and install the software.
- Log into the Axiom GUI using the pillar user account (the password should be "pillar" or "a1s2d3f$"). You MUST login with the pillar user account.
NOTE: In the GUI you should not see any slammers and bricks. If you navigate to "Monitor" tab -> "Hardware" -> "Pilots" screen, you should see the two pilots. Both pilots should be seen from this screen as green/normal OR in a warning state, continue on to the next step. If the pilot is shown to be in a critical state something is wrong, contact Support before proceeding to the next step.
- Navigate to "Support" tab -> "Software Modules" -> in the main screen you will see "Installed Software" and "Staged Software". The "Installed Software" should show a package version that is lower in version than the package to be staged. Click on "Upload Software Package..." and choose the RPM file to stage and click "Ok".
- Wait for the staging process to complete. In the bottom right there is a "Tasks" button, click on it to view all tasks running. You should see a PerformPackageStaging task running, wait for this to complete.
- Navigate back to "Support" tab -> click "Software Modules" and verify that the Staged Software version is newer than the Installed Software.
- In the main screen of "Software Modules", right click and choose "Update Software".
- From the "Update Software" screen choose "Always Install" for ALL Software Modules, select radio button "Restart and update software (disrupts data access)", and check all boxes under "Software Update Options", below is a screenshot of what this looks like:

NOTE: You must be logged into the GUI with the pillar user account to view the "Install Action" column.
- Click "Ok", to start the upgrade. There will be additional pop-up windows that the user will have to acknowledge to proceed with the upgrade, this is normal.
- Wait for the upgrade complete, it should not take too long. During the upgrade, your GUI login session will terminate, open a command prompt and start pinging the pilots.
- Once the pilots are responsive follow Step 1 to download the new Axiom GUI interface (as the pilots should now be on the newer software version).
- Log into the pilots with the pillar user account and navigate to "Monitor" tab -> "Hardware" -> "Pilots" screen. You should see both pilots on the newer server version.
- Check the System Serial number and make sure its correct. In the GUI navigate to "Configure" tab -> "Summary" -> "System", the serial number is viewable in the main screen.
- Check the Network Settings look correct. Log into the GUI with the administrator account, the password is pillar. Navigate to "Configure" tab -> under "Global Settings" click "Networking". In the main screen right-click and choose "Modify Network Settings", make sure it looks correct. Some of the items may be missing, these will be filled in later after the Axiom has been recovered and running Normally. You can't modify the network settings in the GUI at this point.
- Clean Up from the Forced Upgrade:
NOTE: Since the upgrade was done with just the Pilot CUs active, there may be artifacts left over from the upgrade that could prevent a successful system restart. You will need to remove these files from BOTH Pilot CUs
- The Axiom keeps upgrade status in an XML file in /var/PillarPilotPersistence/com.pillardata.pmi.message.SoftwareUpdateStatusInfo/ folder. Delete any XML files in this directory. This must be done on both Pilot CUs.
- The Axiom keeps the originally installed software packages in /var/installed/ folder during any upgrade, in case the pilot software needs them to recover from a failed upgrade. Delete all files and directories in /var/installed/ folder, leaving the /var/installed/ folder intact. This must be done on both Pilot CUs.
- Reboot both Pilot CUs so they will recreate the files with normal information.
-
INSTALL the two new A1811 Enhanced Pilots in the Rack and restart the Axiom:
NOTE: This part of the procedure is disruptive, please make sure no I/O is being generated on the Axiom, this can be checked logging into the GUI and navigating to "Monitor" tab -> "Statistics and Trending" -> "SAN" -> "LUNs", the IOPs for each LUN should be 0.
- Please verify with customer that they have shutdown ALL of their SAN hosts, then wait 15 minutes so any I/O in Slammer cache gets flushed to disk.
- Power off the Slammers and leave them off until these instructions tell you to power them on (ALL the bricks should be kept powered on). To power off the slammers you must disconnect the power cables from both Slammer CUs.
- Power off and remove both of the existing Pilot CUs in the rack.
- Place both new A1811 Enhanced Pilot CUs in the Chassis. DO NOT APPLY POWER!
- Attach the serial and ETH0 Ethernet cables between the two Pilot CUs.
- Attach the ETH1 Ethernet cables from the pilots to the slammers.
- Attach the management interface (the MGMT port) to the external customer network. The external link MUST be connected before power is applied.
- Attach the keyboard and display to the top Pilot CU.
- [The slammers are still powered off] Apply power to the TOP Pilot CU Only, then wait 30 seconds.
- Apply power to the Bottom Pilot CU Only, then wait 30 seconds.
NOTE: The top Pilot CU should be Active as it was powered on first. The Active Pilot CU will have the pmi_if:0 address of 172.30.80.1 and will have both eth0:0 and eth0 IP interfaces up.
- At this point log into the GUI using the pillar account. If the pilots are not reachable on the customer's network (for example, during the preparation steps you may have configured the new pilots on a separate network using ips that are not reachable on the customer network), then you will need to use a laptop and ethernet cable to connect to the active pilot (should be the top pilot) management interface port. From the laptop you should be able to log into the GUI and proceed with the rest of this procedure. Once the Axiom is recovered you can change the pilot ips in the GUI to match the existing pilot ips (Step 17 in this section).
- Per the checklist mentioned in the beginning of this document, you should know what software release the Axiom is on. If the Axiom is on a older software release than the new pilots are on then you MUST upgrade the slammers and bricks. To upgrade the slammers and bricks proceed with Step 12 a) below, if the new pilots and Axiom are on the same release proceed to Step 13.
- Upgrade the Slammers and Bricks. Log back into the GUI using the pillar user account and navigate to "Support" tab -> click "Software Modules", in the main screen right click and choose "Update Software". From the "Update Software" screen choose "Always Install" for "AX500 Slammer Software", "AX600 Slammer PROM", "AX600 Slammer Software", "Brick Fibre Channel Firmware", "Brick Fibre Channel V2 Firmware", "Brick SATA Firmware", and "Brick SATA V2 Firmware". Select radio button "Restart and update software (disrupts data access)", and check all boxes under "Software Update Options", below is a screenshot of what this looks like:

NOTE: Please make sure to choose ALL of the software modules seen in this screenshot even though some of these modules may not apply to the Axiom your upgrading. For example if the Axiom is an AX600 you will still select "AX500 Slammer software", and vice versa. If some of the brick types don't apply you will still select them as shown in the screenshot.Click "Ok", to start the upgrade. There will be additional pop-up screens that the user will have to acknowledge to proceed with the upgrade, this is normal. Once the upgrade starts wait about 20 seconds then proceed with Step 13
.
- Power on the Slammers.
- This will initiate a full system restart. Monitor this restart from the Tasks window.
NOTE: The Axiom will boot on the newer software version IF you installed it on the pilots, and will automatically upgrade any RAID firmware. If you staged and installed the customer's current release, then it will only check the firmware. The cold start time isn't affected by more than 5 minutes if you upgrade compared to leaving it on current release.
- The Axiom should be back up and running normally at this point once the startup is complete. The pilots, slammers, and bricks should be seen in a Normal state.
- After the Axiom startup is complete, make sure the System Serial Number is correct. Do NOT attempt to recover if this serial number is wrong. The serial number can be viewed in the GUI by navigating to "Configure" tab -> "Summary" -> "System" -> in the main page "Serial Number:"
- Check the network configuration and correct any entries needed. Verify Axiom network settings, must login with the administrator account, navigate to "Configure" tab -> under "Global Settings" click "Networking". In the main screen right click and choose "Modify Network Settings".
- Send a test call-home when finished.
References<NOTE:1545203.1> - Pillar Axiom: How to recover a corrupt SSH configuration on both Pilots <NOTE:1577565.1> - Pillar Axiom: Axiom 300, 500, 600 Pilot CU Substitution Matrix <BUG:15934846> - PILOT SSH_CONFIG CORRUPTION
Attachments
This solution has no attachment
|