Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1517629.1
Update Date:2017-11-29
Keywords:

Solution Type  Technical Instruction Sure

Solution  1517629.1 :   How to Perform an Oracle Fabric Interconnect Firmware (XgOS) Upgrade  


Related Items
  • Oracle Virtual Compute Appliance X3-2 Hardware
  •  
  • Oracle Fabric Interconnect F1-15
  •  
  • Oracle Fabric Interconnect F1-4
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SaND-CAP VCAP
  •  




In this Document
Goal
Solution
References


Applies to:

Oracle Fabric Interconnect F1-4 - Version All Versions and later
Oracle Fabric Interconnect F1-15 - Version All Versions and later
Oracle Virtual Compute Appliance X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

 Oracle Fabric Interconnect Firmware Upgrade Best Practices

Solution

DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Basic Oracle Fabric Interconnect product familiarity and skills


TIME ESTIMATE: 60 minutes


TASK COMPLEXITY: 4

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
PROBLEM OVERVIEW: Performing a Oracle Fabric Interconnect firmware upgrade

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Before performing the XgOS upgrade, there are pre-upgrade checks that can be performed.

Before commencing the XgOS upgrade, please review this KB to see if it is relevant to the version of XgOS being installed:

Oracle Virtual Networking - After Upgrade to 4.0.7 XgOS, QDR IB Switch F/W is Not Changed (Doc ID 2204491.1)

Make sure the number of vnics / vhbas per IO Cards matches across both Fabric Interconnects, this will help to find server-profiles that are not configured for redundancy. 

From user ‘admin’ CLI run:

show iocards

Note the ‘v-resources’ column is the number of vnics or vhbas terminated:

Example:

admin@f1-4-sc11-a[xsigo] show iocards
slot   state                descr   type             v-resources
-------------------------------------------------------------------------------
1      up/resourceMissing              sanFc2Port4GbLrCard     1
2      up/up                    nwEthernet10Port1GbCard   0
3      up/up                    nwEthernet1Port10GbCard   1
3 records displayed
admin@f1-4-sc11-a[xsigo]

 

Check the contents of the logs to see if there are any warnings, or errors spewing.  Running ‘showlog <logfilename>’ is like running a ‘tail –f’ on live logs.  It displays the most recent/current events that being logged.

Additionally compare server-profiles across BOTH Fabric Interconnects to make sure that the vnic and vhba counts match.

EXAMPLE:

 

admin@f1-4-sc11-a[xsigo] show server-profile
name                             state                                           descr                  connection                                                 def-gw                  vnics                  vhbas
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
ray_test-bombay                  up/up(someVirtualResourceDown)                                         bombay.xsigo.com@f1-4-sc11-a:ServerPort20                                          1                      1

 Run the commands below as user ‘admin’ with the ‘showlog’ command:

 

showlog user.log

showlog syslog.log

showlog xvnd.log

showlog ib.log

showlog opensm.log

 

If there are repetitive ‘warnings’ or ‘errors’ being logged, of questionable messages repeating in the logs (log spew),  contact the Oracle Support Hotline and open an SR.  Do not proceed with the XgOS upgrade if you see loss of vnic, vhbas or link (link=0 in xvnd.log) that is ongoing and current.  You may need to open an SR with Oracle Support if you cannot resolve inability to failover when performing ‘set server-profile * down’ or setting specific server-profiles down a few at a time.  

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

To perform XgOS upgrade follow these instructions after reading the specifc  XgOS version Product Notes in full located here:

http://docs.oracle.com/cd/E38500_01/index.html

Please make sure to read the upgrade recommendations and KNOWN issues before performing the XgOS upgrade.

1) Start with Fabric Interconnect that is currently OpenSM Master. To find the Fabric Interconnect that is currently the OpenSM Master run:

show diagnostics opensm-param

Look at "SM State" for whether the Fabric Interconnect node is Master or Standby (meshed IB Fabric).  In dual IB Fabric, both Fabric Interconnects will be Master.  Here is an example output of "show diagnostics opensm-param' that shows an external IB Switch that is also running OpenSM.  OpenSM should <only> be running on the internal OVN F1-15 IB Switches, not on any external Sun NM2 IB Switch.  Below examples shows three IB Switches with SUN IB Switch running OpenSM as "Master" with priority 14:

admin@f1-4-sca11-a[xsigo] show diagnostics opensm-param

OpenSM $ Current log level is 0x3

OpenSM $ Current sm-priority is 0

OpenSM $    OpenSM Version       : OpenSM 3.3.13

   SM State             : Master

   SM Priority          : 0

   SA State             : Ready

   Routing Engine       : minhop

   Loaded event plugins : <none>

 

   PerfMgr state/sweep state : Disabled/Sleeping

 

   MAD stats

   ---------

   QP0 MADs outstanding           : 0

   QP0 MADs outstanding (on wire) : 0

   QP0 MADs rcvd                  : 20984033

   QP0 MADs sent                  : 20984006

   QP0 unicasts sent              : 2160272

   QP0 unknown MADs rcvd          : 0

   SA MADs outstanding            : 0

   SA MADs rcvd                   : 182194380

   SA MADs sent                   : 182194380

   SA unknown MADs rcvd           : 0

   SA MADs ignored                : 0

 

   Subnet flags

   ------------

   Ignore existing lfts           : 0

   Subnet Init errors             : 0

   In sweep hop 0                 : 0

   First time master sweep        : 0

   Coming out of standby          : 0

 

   Known SMs

 

Port GUID       SM State    Priority

   ---------       --------    --------

   0x1397020100xxxx Standby     0        SELF  <===> GUID that starts with 01397 denotes OVN F1-15 internal IB Switches

   0x10e0650f3fxxxx Master      14             <===> GUID starting with 10e65 denotes external Sun NM2 IB Switch GUID and shows it is OpenSM "Master". Priority 14 is higher than the default priority of OVN F1-15 IB Switches which default to priority 0. ONE of the OVN F1-15 HA Pair MUST be OpenSM master. To resolve please see solution immediately below.

   0x1397020100xxxx Standby     0

 


SOLUTION TO THE ABOVE:

1) Log into the upstream Sun NM2 IB Switch that matches the GUID showing under "Known SM" and disable OpenSM on the corresponding external IB Switch.  OpenSM must be running <only> on one of internal Mellanox IB Switches in the F1-15 HA pair.

2) Set OpenSM priority to 15 on ONE of the F1-15 HA pair, using this command:


Run as user 'admin' from F1-15 CLI:

# set diagnostics opensm-param priority 15

# resweep


2) set server-profiles down (set a few down at a time to make yourself comfortable). This *forces* failure over and is BEFORE upgrading XgOS. This will often uncover problems with mis-configured server fail over (mis-configured VLANs, down upstream ports in LAG groups, etc….)

set server-profile <profile_name*> down

(use * wildcard for pattern matching to set small groups of server-profiles down in order to verify failover occurred correctly.  If at any time failover of vnic, vhba or server-profile didn't occur, STOP the pre-upgrade process and correct the failover failure before proceeding.  DO NOT initiate XgOS upgrade if any v-star (vnic or vhba) or server-profile failover fails!


4) Export the config and move off the latest saved config located in /var/fsroot/admin to local system for safekeeping using your favorite  SFTP client (Filezilla) or using the SCP command from CLI.  (note this is the config backup step)

system export <chassisname-date>.xml

3)  If under 4.0 XgOS, perform upgrade to 4.0.x XgOS.
4)  All that is needed to do the upgrade is to copy the .xpf file over to the Fabric Interconnect to /var/fsroot/admin as user 'root' and then as user 'admin' run:

'system upgrade <file.xpf>’

EXAMPLE:

system upgrade xgos-3.9.2.xpf

 
For additional information please see:

 Start on Chapter 20 page  369 Upgrading XgOS

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

The Fabric Interconnect will reboot as a result of the XgOS upgrade.  After it has come fully up, login as user 'root' then wait a few minutes and su to admin.

Once the IO Cards have come fully up (show iocards) bring up your server-profiles in staggered fashion (groups of 5 or less) at time:

EXAMPLES:

set server-profile profile* up
set server-profile mysystem* up

Verify that all server-profiles, vnics and vhbas are fully up AND verify that the vhbas and vnics are fully up on the hosts BEFORE starting to upgrade the XgOS of the second Fabric Interconnect.  PLEASE do NOT perform 'system downgrade' if you encounter problems after the upgrade, please open an SR and upload Fabric Interconnect diagnostic log bundles plus upsteam ethernet switch configs and logs so that cause of failure can be analyzed. 

Please call the Oracle Support Hotline and open an SR if you encounter any issues with server-profiles, IO Modules, vnics or vhbas not coming up or passing traffic after XgOS upgade.

NOTE:  if there is any outage that appears to be due to the XgOS upgrade to 4.x XgOS,  and customer deems it is necessary to *rollback* or *downgrade* the XgOS back to the previous XgOS version; for instance customer wants to downgrade from 4.x XgOS to 3.9.x XgOS, there are some things to be aware of.   The 'system upgrade' command installs a brand new image, whereas the 'system downgrade' command points to a previous XgOS image.  

 

When the current XgOS installed version is 4.x XgOS, the 'system upgrade 3.9.x-XGOS.xpf' command cannot be used because there is a new version of login encryption that breaks when using the 'system upgrade' command to go back from 4.x XgOS to 3.9.x XgOS which consequently breaks the login to the Fabric Interconnect.  The only way to recover a Fabric Interconnect that had 4.x XgOS installed and then was downgraded to 3.9.x XgOS using the 'system upgrade' command,  is to ship a new Gen2 Front Panel to the customer and then install the requested version of XgOS.     To avoid breaking login to the Fabric Interconnect if customer has urgent need to downgrade, the 'system downgrade 3.9.x-XGOS.xpf' command *must* be used instead to avoid losing login access to the Fabric Interconnect when going back to previous older 3.9.x XgOS version.  The 'system downgrade' command was specifically tested by OVN QA on 4.0.x XgOS which was then downgraded to 3.9.2.  This is the only recent XgOS code branch that the 'system downgrade' command was tested by OVN QA. 


References

<NOTE:2204491.1> - Oracle Virtual Networking - After Upgrade to 4.0.7 XgOS, QDR IB Switch F/W is Not Changed

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback