Asset ID: |
1-71-1951961.1 |
Update Date: | 2018-01-09 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1951961.1
:
M8-8 / M7-8 / M7-16 - How to replace a Faulty CMIOU in a CMIOU chassis [VCAP]
Related Items |
- SPARC M8-8
- Oracle SuperCluster M7 Hardware
- SPARC M7-8
- SPARC M7-16
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
- Microlearning>Video>ML-VID-VCAP
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: this is now a FRU
Applies to:
SPARC M7-8 - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster M7 Hardware - Version All Versions to All Versions [Release All Releases]
SPARC M7-16 - Version All Versions to All Versions [Release All Releases]
SPARC M8-8 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Goal
CAP PROBLEM OVERVIEW: M8-8 / M7-8 / M7-16 CMIOU chassis - CMIOU Failure
*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1951961.1
*********************************************************************
ESD Caution:
- Circuit boards and drives contain electronic components that are extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
- Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
- Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.
Contamination Caution:
- Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.
Solution
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED: M8-8 / M7-8 / M7-16 product training
TASK COMPLEXITY: 2
TIME ESTIMATE: 90 minutes
HOT replacement
FIELD ENGINEER INSTRUCTIONS : CMIOU FPGA update may be required during CMIOU replacement. See Below
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : The Physical Domain to which the CMIOU is assigned must be shut down (HOST stopped).
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
Important notes :
- After replacing the new CMIOU, make sure that the FPGA version is the latest available and upgrade as appropriate - Refer to Doc 2076387.1
- Before returning the removed CMIOU check whether the component must be CPASed - Refer to Doc 2079830.1 (Partner service providers exempt)
Caution
- To prevent overheating, fill unused CMIOU slots with CMIOU filler panels
- Remove a faulty CMIOU only when a replacement CMIOU is available. Install the new CMIOU as quickly as possible, within 10 minutes, if possible.
- You must prepare a CMIOU before removing it from the server. Complete the steps 1 through 3 before you unseat or remove a CMIOU
Preparing a CMIOU for Removal
1. To determine which CMIOU requires service, use one of these Oracle ILOM commands to display faulty components and determine which CMIOU needs to be prepared for service:
-> show faulty
-> show /System/Open_Problems
faultmgmtsp> fmadm faulty
2. Locate the affected CMIOU by its amber Service Required LED
3. Prepare the affected CMIOU for removal:
3.1 Determine whether any of the Logical domain in the Physical domain is using VersaBoot (iSCSI over IPoIB) for booting.
if so, refer to SPARC M8 and SPARC M7 Series Servers : VersaBoot (iSCSI over IPoIB) - CMIOU/eUSB replacement considerations (Doc ID 2107700.1)
3.2 Determine whether the CMIOU is powered off.
-> show /System/DCUs -level 4 location=="CMIOUy*" power_state (where y is the CMIOU number reported in steps 1 &2 )
3.3 Determine the next step
-
- If the CMIOU is running (power_state=on), go to Step 3.4
- If the CMIOU is not running (power_state=off), go to Step 3.8
3.4 Determine to which PDomain the CMIOU is assigned.
-> show /System/DCUs/DCU_x host_assigned (where x is the DCU_x number reported in step 3.2 )
3.5 Access the PDomain console.
-> start /Servers/PDomains/PDomain_z/HOST/console (where z is the HOSTz number reported in step 3.4 )
-
- Connect to the host console in a separate terminal session, before stopping the host
- ensure that you can view/capture all messages reported by the system while you stop the PDomain in the next step
3.6 Stop the PDomain
-> stop /Servers/PDomains/PDomain_z/HOST (where z is the HOSTz number reported in step 3.4 )
3.7 Check /System/DCUs/DCU_x/CMIOU_y power_state, until power_state reports off
-> show /System/DCUs/DCU_x/CMIOU_y power_state
3.8 Prepare the CMIOU for removal and verify that it is ready to remove.
-> set /System/DCUs/DCU_x/CMIOU_y action=prepare_to_remove
-> show /System/DCUs/DCU_x/CMIOU_y health
When the CMIOU is ready to remove, the above value will display Offline, and the blue Ready to Remove LED will light. Continue with step 4.
Prepare the replacement CMIOU for usage
4. Unpack the new CMIOU on a static-safe mat.
A CMIOU is heavy. A fully-loaded CMIOU weighs weighs 25 lbs (11.3 kg). Use two hands when handling a CMIOU and do not handle it by the ejectors.
5. Remove the plastic covers from the connectors on the new CMIOU and set them aside for installation on the old CMIOU connectors, once you have removed it from the system. You must install
this cover on the CMIOU that you return to Oracle.
6. Remove the top cover from the replacement CMIOU
-
- Press down on the green button at the top of the cover to disengage the cover from the CMIOU
- While pressing the button, grasp the rear edge of the cover and slide it toward the rear of the CMIOU until it stops
- Lift the cover off
Removing a CMIOU
7. Unseat the faulty CMIOU
-
- Pinch the latch on the back of each ejector arm.
- Pull the ejector arms toward you to disengage the CMIOU connectors from the server.
- Grasp the ejector arm latches as close to the base as possible and pull the CMIOU one-third to halfway out of the server.
8. Remove the faulty CMIOU
The rear of the unit is heavy. The CMIOU weighs 25 lbs (11.3 kg). Use two hands to remove the CMIOU from the chassis.
-
- Fold the ejector arms back together, toward the center of the CMIOU until they latch into place. This will keep the levers from getting damaged when you pull the CMIOU out.
- Carefully remove the CMIOU from the server, using two hands, and avoid bumping the rear connectors
9. Place the faulty CMIOU on an antistatic mat.
Install the plastic cover that you removed from the connectors on the new CMIOU on the connectors of the CMIOU you are replacing. Ensure that the connector cover is fully engaged and centered over the connectors.
10. Remove the top cover from the failing CMIOU
-
- Press down on the green button at the top of the cover to disengage the cover from the CMIOU
- While pressing the button, grasp the rear edge of the cover and slide it toward the rear of the CMIOU until it stops
- Lift the cover off
11. Transfer DIMMs and PCIe components to the new replacement CMIOU.
12. If the faulty CMIOU is the only CMIOU in a logical domain guest that uses iSCSI over IPoIB (versaboot) for booting, and the eUSB disk is the only disk in the boot pool, remove the eUSB disk so you can reinstall it in the new CMIOU.
Installing a CMIOU
12. Reinstall the CMIOU cover and slide the cover forward until the latch clicks into place.
13. Insert the replacement CMIOU into its slot
- Carefully slide the CMIOU less than half way into the slot, taking care to avoid bumping the connectors on the back of the CMIOU
- Open the green CMIOU levers so that they are fully open
- Insert the CMIOU back into its slot in the server until the levers begin to engage
- Press the levers back together toward the center of the CMIOU, and then press the levers firmly against the CMIOU to fully seat it into the server
14. After insertion, check the event logs to confirm if any CMIOU FPGA update is required
Example :
-> show /SP/logs/event/list/
Event
ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
256 Tue Dec 15 07:21:23 2015 Chassis Log major
/SYS/CMIOU6/FPGA update required.
15. If FPGA update is required, refer to SPARC M7 Series Servers : CMIOU FPGA firmware update (Doc ID 2085049.1).
Return the faulted component to Oracle.
Caution - The removed CMU must be properly repackaged to prevent damage during return transportation to Oracle. The CMIOU should be repackaged in identical fashion as the delivered FRU. See the Service Manual for details and illustrations about proper repackaging : Return a CMIOU to Oracle
Caution - Do not place tools inside a container that is being used to return a CMIOU.
- Ensure that you have installed the protective connector cover on the CMIOU that you are returning to Oracle, which you removed from the replacement CMIOU. You must install this cover on the CMIOU that you return to Oracle.
- Position the CMIOU on the inner container.
- Ensure that the protruding end of the CMIOU latch is centered in the window of the inner container.
- Place corrugated sheet on top of the CMIOU with the “Top Rear” marking on the rear of the chassis facing up. Ensure that the corrugated sheet extends over the connector.
- Close inner container and secure the container with tape.
- Fold the bottom and top flaps of the inner container over the CMIOU.
- Fold the side flaps of the inner container over the CMIOU.
- Assemble the outer container and place the packing cushions as described in the Service Manual.
- Close the outer shipping container and seal it with the packaging tape supplied by Oracle.
OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Verify that there is no faulty components
- -> show faulty
- -> show /System/Open_Problems
- faultmgmtsp> fmadm faulty
Perform one of the following tasks based on your verification results
- If the previous steps did not clear the fault, refer to doc 1309092.1 for information about the tools and methods you can use to diagnose and clear component faults.
- If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required
Before restarting the PDomain, confirm from restricted shell that all of the components are running the latest respective FPGA version :
[(restricted_shell) sp0:~]# hw version | grep "M7_FPGA"
Refer to SPARC M7 Series Servers : CMIOU FPGA firmware update (Doc ID 2085049.1) if any further FPGA update is required.
If required, for the respective host, change the values for the /HOSTx/diag default_level and hw_change_level before starting the host.
-> set /HOSTx/diag/ default_level=max
-> set /HOSTx/diag/ hw_change_level=max
Restart the PDomain you stopped in step 3.5, and monitor/capture console HOST console output (use two windows):
-> start /Servers/PDomains/PDomain_z/HOST/console
-> start /Servers/PDomains/PDomain_z/HOST
Restart software applications per applicable administration guides to resume system operation.
After all PDomains have restarted, repeat the steps to verify that there is no faulty components, to ensure starting the PDomains has not triggered new faults.
If a ZFS fault is reported against the eUSB disk from the CMIOU being replaced, make sure the label on the disk is correct; format/label as needed.
See SPARC M8 and SPARC M7 Series Servers : ZFS fault on eUSB disk after CMIOU replacement (Doc ID 2184100.1)
Return the /HOSTx/diag default_level and hw_change_level to their original value.
======================== Other info =====================
REFERENCE INFORMATION: Service Manual: http://docs.oracle.com/cd/E55211_01/html/E55215/index.html
Attachments
This solution has no attachment