Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1487791.1
Update Date:2017-03-14
Keywords:

Solution Type  Technical Instruction Sure

Solution  1487791.1 :   SuperCluster - How to cleanly shutdown and startup an Oracle SuperCluster T4-4 or T5-8  


Related Items
  • SPARC SuperCluster T4-4
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Oracle SuperCluster T5-8 Full Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
  •  




In this Document
Goal
Solution
 Shutdown Procedures
 If running Oracle Solaris Cluster OSC3.3u1/S10 or OSC4.0/S11 Then you need to shutdown the clustering service. Run the following on all global zones involved in clustering to prevent failover when shutting down applications and zones.
 If running OpsCenter 12C in SuperCluster mode you will have to also halt the enterprise controller so it does not attempt to fail over while bringing down CRS.
  Follow applicable documentation to cleanly shutdown all user applications or databases running in zones or LDoms.
 Obtain a list of all running zones
  Shutdown all running zones
  Obtain a list of all running LDoms
  Obtain the names of the LDoms with direct hardware access.
  Stop the domains from the ldm list command one that are not on this list
  Stop the guest domain with hardware access
  Shutdown the CRS stack on all  domains running Oracle CRS.
 Verify all oracle processes are stopped and if they are not remediate as necessary
  Shutdown Exadata storage cell services and operating systems
  Shutdown the operating system of the control LDom
  Connect to the compute node ILOM and stop SYS
  Show and then set, if need be, the power switch settings so the T4-4 or T5-8 machines DO NOT power on automatically when the rack power is restored. the following show the settings that you want to reach.
  Shutdown the  ZFS Storage Appliance
  The switches do no have specific power off instructions they will be powered off when power is removed from the rack.
  Flip the breakers on the PDUs to the off position.
 Startup Procedures
  Flip the breakers on all PDUs to the on position
  Verify and if necessary fix the partitioning on the IB Switches
  Startup the  ZFS Storage Appliance
  Verify the startup of the Exadata storage cells
  Bring up the T4-4 or T5-8 systems
  Verify the system
  Restart all applicable applications and test.
 Community Discussions
References


Applies to:

SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster T5-8 Full Rack - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster T5-8 Half Rack - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 Half Rack - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Goal

 Describe the recommended procedure for cleanly powering down and powering up an Oracle SuperCluster T4-4 or T5-8.

 

Solution

 

This note will address the various offered configurations of Oracle SuperCluster if your machine has any variations approved as exceptions then your steps may vary.

 

Shutdown Procedures

If running Oracle Solaris Cluster OSC3.3u1/S10 or OSC4.0/S11 Then you need to shutdown the clustering service. Run the following on all global zones involved in clustering to prevent failover when shutting down applications and zones.

# /usr/cluster/bin/cluster shutdown -g 0 -y

If running OpsCenter 12C in SuperCluster mode you will have to also halt the enterprise controller so it does not attempt to fail over while bringing down CRS.

# /opt/SUNWxvmoc/bin/ecadm ha-stop-no-relocate

 Follow applicable documentation to cleanly shutdown all user applications or databases running in zones or LDoms.

Obtain a list of all running zones

# zoneadm list
global
sol10_zone

 Shutdown all running zones

# zoneadm  -z sol10_zone shutdown

 Obtain a list of all running LDoms

# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  UPTIME
primary          active     -n-cv-  UART    128   523776M  1.1%  4d 19h 50m
orlscclldm01     active     -n----  5001    64     32G      0.0%  11d 2h 59m
ssccn2-app1      active     -t--v-  5000    64    256G     1.6%  3d 23h 45m

 

The T4-4 and T5-8 LDom  configurations can vary based off configuration chosen during installation. If running with 1 LDom you will shutdown the machine just as you would any other server just by cleanly shutting down the OS. If running 2 Ldoms you will shutdown the guest domain first and then the primary (control). if running with 3 or more domains you will have to identify the domain(s) that is/are running off virtualized hardware and shut it/them down first before moving on to shutting down the guest domain and finally the primary(control).

 Obtain the names of the LDoms with direct hardware access.

T4-4
# ldm  list-io |egrep  "pci@400|pci@700"
pci@400         pci_0           primary
pci@700         pci_3           ssccn2-app1
...
T5-8 is built a bit differently but you can identify the same searching on the SASHBA
# ldm list-io |grep SASHBA
/SYS/MB/SASHBA0                           PCIE   pci_0    primary  OCC
/SYS/MB/SASHBA1                           PCIE   pci_15   ssccn1-dom3 OCC

 Stop the domains from the ldm list command one that are not on this list

# ldm stop-domain orlscclldm01

 Stop the guest domain with hardware access

# ldm stop-domain ssccn2-app1

 

 Shutdown the CRS stack on all  domains running Oracle CRS.

# /u01/app/11.2.0.3/grid/bin/crsctl stop crs

Verify all oracle processes are stopped and if they are not remediate as necessary

ps -ef |grep oracle

 Shutdown Exadata storage cell services and operating systems

# cd /opt/oracle.SupportTools/onecommand
# dcli -g cell_group -l root 'cellcli -e "alter cell shutdown services all"'
# dcli -g cell_group -l root shutdown -now

 Shutdown the operating system of the control LDom

# shutdown -g0 -i0 -y

 

 Connect to the compute node ILOM and stop SYS

stop /SYS

 Show and then set, if need be, the power switch settings so the T4-4 or T5-8 machines DO NOT power on automatically when the rack power is restored. the following show the settings that you want to reach.

-> show /SP/policy

 /SP/policy
    Targets:

    Properties:
        HOST_AUTO_POWER_ON = disabled
        HOST_COOLDOWN = disabled
        HOST_LAST_POWER_STATE = disabled
        HOST_POWER_ON_DELAY = disabled
        PARALLEL_BOOT = enabled
If any of yours are set to enabled modify them as such
->set /SP/policy HOST_AUTO_POWER_ON=disabled

 Shutdown the  ZFS Storage Appliance

Browse to the BUI of both storage heads and form the dashboard select the power off appliance button in the upper left section of the screen below the Oracle logo.

 

 The switches do no have specific power off instructions they will be powered off when power is removed from the rack.

 Flip the breakers on the PDUs to the off position.

Startup Procedures

Please note that if you are running switch firmware 1.1.3-x you will need to run steps to correct the switch infiniband partitioning. This is documented in <Document 1452277.1> SPARC SuperCluster Critical Issues. it is highly advisable to upgrade your rack to the latest Quarterly Maintenance Bundle to get the switch to version 2.0.6 or above to prevent this issue. The link to the download can be found here <Document 1567979.1> SPARC SuperCluster and Exadata Storage Server 11g Release 2 (11.2) Supported Versions.

 Flip the breakers on all PDUs to the on position

Only perform the following steps if the switches are at a firmware version below 2.0.6

 Verify and if necessary fix the partitioning on the IB Switches

#  smpartition list active
# getmaster

 

smpartition command should reflect 3 or more partitioned on 0x0501,0x0502,0x0503,etc... depending on configuration. getmaster should reflect the spine switch as the master. If does not follow go to the next command

 

# smpartition start; smpartition commit

 

If this does not remediate the issue please open an SR with your SuperCLuster CSI and serial number and request an engineer to assist you with the more indepth remediation steps. Reference this document ID in your SR.

 

Internal remediation steps before proceeding check /conf/configvalid and verify that it is 1. If it is not at any point during these steps echo 1 > /conf/configvalid

START:

Modify the host list in the following commands for the customer's IB switch host names

From Compute node, disablesm on all IB switches :

# for IBSW in ib-sw1 ib-sw2 ib-sw3; do ssh $IBSW disablesm ; done

view and fix if necessary network address/netmask on spine/IB1, reboot it (first time only!) :

# vi /etc/sysconfig/network-scripts/ifcfg-eth0
# reboot

From Compute node, enablesm on all IB switches :

Before running Verify the /conf/configvalid here and if not correct log into each switch and fix

# for IBSW in ib-sw1 ib-sw2 ib-sw3; do ssh $IBSW cat /conf/confgivalid ; done
# for IBSW in ib-sw1 ib-sw2 ib-sw3; do ssh $IBSW enablesm ; done

CHECK:
To see if a master subnet manager was arbitrated :


 # for IBSW in ib-sw1 ib-sw2 ib-sw3; do ssh $IBSW getmaster ; done

If a subnet manager was arbitrated but it is not ib-sw1 (spine), go back to START: (except skip the network check/reboot steps)

else if no subnet manager was arbitrated :
Verify the /conf/configvalid here


# for IBSW in ib-sw1 ib-sw2 ib-sw3; do ssh $IBSW cat /conf/confgivalid ; done

From the spine switch

# smpartition start;smpartition stop
go back to CHECK:

 

 

 Startup the  ZFS Storage Appliance

Browse to the BUI of both storage heads and if you can connect proceed to the Exadata storage cell steps

 

 

If you can not connect with the BUI verify the 7320 has started by doing an ssh as root into the sp of heads and issuing the following:

 

-> start /SYS

 Verify the startup of the Exadata storage cells

 

Run the following from cel01 of your SuperCluster as celladmin verify that the cell services are online and that all griddisks are active.

 

dcli -g cell_group -l celladmin 'cellcli -e "list cell"
dcli -g cell_group -l celladmin 'cellcli -e "list griddisk"

 Bring up the T4-4 or T5-8 systems

Log into the ILOM  for each T4-4 or T5-8 and start /SYS and then monitor the progress via the /SP/console

 

-> start /SYS
->start /SP/console

 Verify the system

Unless configured otherwise by the site database administrators or system administrators all LDoms, Zones, Clusterware and Database related items should come up automatically as the system comes up. If it fails to do so manually start these components as per your site standard operating procedures. Please verify the system is all the way up via the console before checking dependent items. If for any reason you can not restart anything please gather appropriate diagnostic data and file an SR after consulting with your local administrators. The svcs -xv will let you know which system services if any did not start and assist you in debugging why.

 

# ldm list
# zoneadm list
# /u01/app/11.2.0.3/grid/bin/crsctl status res -t
# svcs -xv

 Restart all applicable applications and test.

Community Discussions

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot)

Click here to open in main browser window

References

<NOTE:1452277.1> - SuperCluster Critical Issues
<NOTE:1567979.1> - Oracle SuperCluster Supported Software Versions - All Hardware Types

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback