Asset ID: |
1-71-1004879.1 |
Update Date: | 2017-06-06 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1004879.1
:
Sun Fire[TM] 3800, 48x0, 6800, E2900, E4900, E6900, v1280 or Netra[TM] 1280, or 1290: Resetting a component's CHS status using setchs
Related Items |
- Sun Netra 1280 Server
- Sun Fire 3800 Server
- Sun Fire 6800 Server
- Sun Fire E2900 Server
- Sun Fire 4810 Server
- Sun Fire V1280 Server
- Sun Fire 4800 Server
- Sun Netra 1290 Server
- Sun Fire E6900 Server
- Sun Fire E4900 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-x8x0/Ex900
- _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
- _Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
- _Old GCS Categories>Sun Microsystems>Servers>NEBS-Certified Servers
- _Old GCS Categories>Sun Microsystems>Servers>Midrange V and Netra Servers
|
PreviouslyPublishedAs
206842
Applies to:
Sun Netra 1290 Server - Version All Versions and later
Sun Fire 3800 Server - Version All Versions and later
Sun Fire 4810 Server - Version All Versions and later
Sun Fire V1280 Server - Version All Versions and later
Sun Fire 6800 Server - Version All Versions and later
All Platforms
Goal
This document describes how to re-enable a component that has been marked Faulty or Suspect by Component Health Status (CHS). This document is relevant to the Sun Fire[TM] 3800, 4800, 4810, 6800, E2900, E4900, E6900, v1280 or Netra[TM] 1280, 1290 systems.
The System Controller (SC) or lom command showchs might report Faulty or Suspect component(s) similar to the following example:
lom>showchs
Component Status
--------------- --------
/N0/SB2 Suspect
/N0/SB2/P0 Faulty
/N0/SB2/P1 Faulty
prtdiag may also reflect component(s) as failed or disabled, such as the following example:
Fru Operational Status:
-------------------------
Location Status
-------------------------
SB0 failed
SB0/P2 disabled
SB0/P3 disabled
SB0/P0 disabled
SB0/P1 disabled
Important Notes: Disabled hardware should be investigated by Support Services prior to resetting any CHS status for any component(s).
- Failure to investigate why component(s) were marked Suspect or Faulty prior to resetting their status could leave the system exposed to future outages.
- You are strongly encouraged to open a service request and have Support Services examine the appropriate data to validate if the component(s) status should be reset or if hardware needs to be replaced.
Please note that it is fairly common to have to reset component(s) status following service actions (hardware replacement). This is especially true in the case of CPU or Memory errors where a CPU and it's entire collection of associated Memory DIMMs be marked Suspect.
Solution
Procedure to reset a component's CHS status.
1. As stated before, a support engineer should validate that the component(s) CHS status should be reset.
The support engineer should perform analysis of the data and determined whether the CHS status should be reset or whether the component should be replaced.
2. Assuming that a Support Services engineer verified that the CHS status needs to be reset, the following options exist to reset it's status:
- If the system is running ScApp < 5.20.15 (ie. 5.20.14 or lower and 5.21.x IS lower)
The CHS status can only be reset by the support services engineer using Oracle Shared Shell if that option exists. This is because the setchs command is ONLY available in a restricted service mode on the SC for which a service engineer is required to perform the procedure.
You are encouraged to upgrade ScApp to avoid this inconvenience (See STEP 3).
If Shared Shell is not an option for a particular site, a field engineer must be dispatched, but this may involve Time and Material charges depending on contract terms.
- If the system is running ScApp 5.20.15 or higher
The CHS status can be reset from the SC or lom prompt by anyone who can login to the Main SC and no special access is required at all.
Perform the following steps:
-
- Verify what is currently marked as Suspect or Faulty.
6800-sc1:SC> showchs -b
Component Status
--------------- --------
SB3/p0 Faulty
-
- Reset the CHS status of the component(s) in question.
6800-sc1:SC> setchs -s OK -r "service_request_number" -c SB3/p0
-
- Validate that the component's status has been reset.
6800-sc1:SC> showchs -b
Component Status
--------------- --------
3. If a component was marked Faulty it will not be back in the configuration until it is run through POST.
The component must be 'DR'd' (Dynamic Reconfiguration) out and then back into the domain, or the domain must be rebooted (sometimes known as 'keyswitched') to prompt this testing.
Assuming the component runs through POST testing, it should be configured back into the domain. Contact Support Services if this presents any problems. Make sure to provide the console log showing the POST execution so they can diagnose any issues that remain.
Internal Only Instructions for Support Service engineers
Engineers should validate why the component's status has been marked CHS Faulty or Suspect prior to resetting its status.
Utilize Document 1010056.1 to validate whether the component that is currently disabled, Faulty, Suspect, or Missing is defective or not.
If it is defective, the FRU should be replaced instead of having its status reset.
If it is determined that a component(s) CHS status needs to be reset, do so depending on which version of ScApp is installed.
If the system is running ScApp 5.20.15 or higher:
Have the customer follow the procedure documented in the public section of this knowledge article (See STEP 2 ).
If the system is running ScApp 5.20.14 or lower (5.21.x IS lower):
You MUST generate a service mode password and then reset the status of the device for the customer. The customer should not be given access to service
mode if at all possible. You should make every attempt to perform this procedure for the customer using Oracle Shared Shell.
If you need to reset the status using service mode perform the following steps:
1. Obtain the System Controller's HostID, ScApp version, and RTOS version.
To obtain this information, enter a carriage return in place of the password three times:
Connected to Hostname-sc.
Escape character is '^]'.
Enter Password: <--- Enter Return Here
Invalid password.
Enter Password: <--- Enter Return Here
Invalid password.
Enter Password: <--- Enter Return Here
Invalid password.
HostID: 83195a96
ScApp version: 5.13.0009
RTOS version: 23
2. Generate a Service Mode password.
Take the information from step 1 and visit the Service Mode Password Generator or here to generate a service mode password.
A back up is here: Backup Service Mode Password Generator
3. Utilize Oracle Shared Shell to connect to the customer's system and perform the reset procedure.
Where customers a unable to use Oracle Shared Shell, follow the recommendations in Document 1010655.1 and directly supervise the customer's use of this access to reset the CHS status.
4. Verify what is currently marked as Suspect or Faulty.
6800-sc1:SC[service]> showchs -b
Component Status
--------------- --------
SB3/p0 Faulty
5. Reset the CHS status of the component in question.
6800-sc1:SC[service]> setchs -s OK -r "service_request_number" -c SB3/p0
6. Validate that the component's status has been reset.
6800-sc1:SC[service]> showchs -b
Component Status
--------------- --------
NOTE: To exit service mode, simply 'service' again
7. The component will have to have POST executed to return it to service.
This can be accomplished by executing a setlkeyswitch on or DR operation.
When performing this action, monitor POST to assure that no errors are detected on this newly reset device.
Background Information on why you might be need to reset CHS status.
There may be times when a good component, such as a CPU or system board, is marked as faulty. Here are some reasons good components get marked as bad:
Example1: CR 4868106 - Upgrading to 5.15.0 without following upgrade procedures can lead to a "ParitySingle error" and a CHS disabled SB.
Example2: POST fails test ID 6.1, with an error in like: ERROR: TEST=Memory Tests,SUBTEST=Memory Addressing ID=61.1
In this situation, the CPU is failed in order to disable the memory it controls, but the CPU is fine. It is the memory DIMM(s) which need to be replaced. For the case of bad memory, here is what you need to do:
- Use setchs to re-enable the cpu.
- Verify with showchs that the pending status is 'ok'.
- DR the SB out and replace the memory.
- DR the SB back in.
When you reinsert the SB, the local tests will be sufficiant to make the chs status 'current' vs 'pending'. You don't need to do a setkeyswitch off on the domain !
These cases are not the only ones where a good component can be marked as faulty. The point is, question the recent history of the machine and any maintenance activity when you have a CHS disabled component, before proceeding to re-enable it.
A word of caution - do not just 'blindly' re-enable a component, since the system disabled it for a reason. When in doubt, seek the advice of a senior engineer by collaborating with the next level of technical support.
Previously Published As 72066
References
<NOTE:1019144.1> - Data Requirements reference: What data is needed in order to troubleshoot my software or hardware problem?
<NOTE:1010655.1> - Where can I get the service mode password for a Sun Fire[TM] 3800/48x0/6800/E4900/E6900/E2900/V1280 and Netra[TM] 1280/1290 server?
Attachments
This solution has no attachment