Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1005071.1
Update Date:2017-02-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1005071.1 :   Sun Fire[TM] Servers (3800/4800/4810/6800/E4900/E6900): I2C communication problem  


Related Items
  • Sun Netra 1280 Server
  •  
  • Sun Fire 15K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire E25K Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Netra 1290 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-x8x0/Ex900
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
207128


Applies to:

Sun Fire V1280 Server - Version All Versions and later
Sun Netra 1280 Server - Version All Versions and later
Sun Fire 3800 Server - Version All Versions and later
Sun Fire 4800 Server - Version All Versions and later
Sun Fire 4810 Server - Version All Versions and later
All Platforms

Symptoms

How to resolve an I2C communication problem on Sun Fire [TM] Midrange Servers.

On Sun Fire[TM] Midrange servers, a message similar to the following is noted in the System Controller (SC) logs (showlogs file) and indicates an i2c communication problem:

Feb 28 08:51:55 sc0 Domain-A.SC: [ID 423004 local0.error] /N0/SB0 cannot set/get LEDs state due to sun.serengeti.I2c Exception: I2cComm.busyWait: busyWait() timeout waiting for RRDY, status=0x0022c008, bus=0(/N0/SB0) ring=00 addr=22.
Feb 28 08:57:33 sc0 Platform.SC: [ID 912108 local0.error] /N0/SB0: Could not get i2cSwitch

 In the example above, the indication is that the I2C communication is broken between the Main SC and System Board (SB) 0.  Your error message might be between the SC and any number of devices (SB, IB, RP, etc).

Changes

 Any Recent System Maintenance should be considered as possibly causing i2c issues.

Cause

  I2C communication problem on Sun Fire [TM] Midrange Servers.

Solution

 

Resolving i2c error events.

Example error messages from a Sun Fire[TM] Midrange server's SC Platform log file (showlogs):

 

Feb 28 08:50:55 sc0 Platform.SC: [ID 132274 local0.error] Error attempting to access /N0/SB0sun.serengeti.HpuFailed Exception: L1Hpu getPower: sun.serengeti.I2cException: I2cComm.busyWait: busyWait() timeout waiting for RRDY, status=0x0022c008, bus=0(/N0/SB0) ring=00 addr=22
Feb 28 08:57:33 sc0 Platform.SC: [ID 912108 local0.error] /N0/SB0: Could not get i2cSwitch

 

Example error messages from a Sun Fire[TM] Midrange server's Domain log file (showlogs -d <domain ID> -v):

 

Feb 28 08:51:55 sc0 Domain-A.SC: [ID 423004 local0.error] /N0/SB0 cannot set/get LEDs state due to sun.serengeti.I2 cException: I2cComm.busyWait: busyWait() timeout waiting for RRDY, status=0x0022c008, bus=0(/N0/SB0) ring=00 addr=22.
Feb 28 08:52:07 sc0 Domain-A.SC: [ID 423004 local0.error] /N0/SB0 cannot set/get LEDs state due to sun.serengeti.I2 cException: I2cComm.busyWait: busyWait() timeout waiting for RRDY, status=0x0022c008, bus=0(/N0/SB0) ring=00 addr=22.
Feb 28 08:59:12 sc0 Domain-A.SC: [ID 136975 local0.notice] Domain Shell - A: setkey on: Initiating keyswitch: on, domain A.

 

Analysis of the above examples:  I2C communication is broken between Main SC and System Board 0.

Suspect FRUs:   SB0, SC, CP  (Most likely to least likely order)

The possibility exists that this is an issue in the Centerplane (CP), or the SC, however, the SB is the most likely option. Always take into account previous service activity.  If no activity check the SB first.

Troubleshooting this error event:   

  1. In order to troubleshoot i2c communication, on a system with dual SC configuration, try to fail over to the Spare SC and confirm whether the errors persist on the new SC or are resolved. 
  2. Try failing back and confirm if the errors have ceased.
  3. Confirm that only a single board is indicating i2c error events.

Replacement Advice:

  • If multiple boards report i2c error messages attributed to the same SC, replace the SC.
  • If the same board consistently causes messages on both SCs, the board should be replaced.
    • If errors persist following board replacement the CP slot is suspect.
  • If the same board consistently causes messages on only one SC, the board should be replaced.
    • If errors persist following board replacement the SC should be replaced next.
  • If the issue is resolved through SC failover and then reboot, do not replace hardware.

 


 

 Additional Information (Internal Only)
It is possible for these error messages to have been caused by board damage as a result of shipping damage, improper handling, packaging defects, or other factors.  In the event that something of this nature has taken place, you might notice a missing component on the board.

The following information is from an archived (retired) Field Information Notice which used to exist on this topic and is meant to provide further insight to this damage caused issue.  While this Field Information Notice is no longer active it is still possible for board damage to result in this error message so it is relevent to have this information available for use:

Previously published as FIN #: I0855-1
SYNOPSIS:     Sun Fire CPU/Memory Boards may become unusable when components are knocked off by improper handling.

REFERENCES:
Bug 15090778: SUNBT4607180 BROKEN RESISTORS ON CPU/MEM BOARD NOT FOUND UNTIL SCFAILOVER
Bug 15091072: SUNBT4614549 SCPOST: TEST COVERAGE IMPROVEMENT TO SUPPORT SC FAILOVER         

   
PROBLEM DESCRIPTION:
Sun Fire CPU/Memory Boards may be damaged when components are knocked off due to improper handling.  Surface Mounted Devices on the bottom of these system boards are susceptible to being knocked off by rough handling.  This damage may make the boards totally unusable or make them subject to failure at a later time.

Affected systems include Sun Fire 3800, 4800, 4810, 6800, 12K and 15K.  Damaged boards can be difficult to identify visually due to the small size of the components.  In several cases, it has been found that resistor R18250 has been knocked from the board.  A picture of a damaged board can be seen in the picture attached at the very bottom of this article.

Failure symptoms will vary depending on which component is missing.  In some cases, the board will fail POST and be totally unusable.  In the case of a missing R18250, the board will operate normally as long as SC0 is the main System Controller.  If SC1 is the main System Controller, the domain will not boot.  If the domain boots with SC0 as the main, the problem will not become evident unless the system fails over to SC1.  With R18250 missing, the board cannot communicate with SC1 due to a broken I2C Bus connection.  The SMS/ScApp software will be unable to control the board.

With a missing R18250 component on an F15K board, the following errors can be observed as a result of this problem:

Examples:

I2c read time out - bus: 29, address: 27 -- This is Slot 5 CPU board
I2c read time out - bus: 40, address: 27 -- This is Slot 16 CPU board

 
   and

Unable to retrieve board info for: SB5
Unable to retrieve board info for: SB16

 
On Sun Fire 3800-6800 systems the following errors can been as a result of this problem:

May 10 16:01:21 noname Platform.SC: SB1: Could not get i2cSwitch

 

 A missing R18250 resistor has been reported several times in the field.  It is possible that other components are being knocked off as well.  This may result in different errors and symptoms.

This problem is being caused by improper handling of the CPU/Memory Boards in the field.  It is probable that components are knocked off when the board is not carefully removed from its shipping carton, or when it is not properly aligned when being inserted into the system chassis.

It is important that a CPU/Memory board is inserted carefully aligned with the chassis.  It is also important that the CPU/Memory board is removed properly from its shipping carton.  The procedure is clearly indicated on the carton.  The key part is to NOT use the handle while removing the CPU/Memory board out of the shipping carton.

CORRECTIVE ACTION:
The following recommendation is provided as a guideline for authorized Enterprise Services Field Representatives who may encounter the above mentioned problem.

Prevent damage to Sun Fire CPU/Memory Board components as follows:

    1, Remove new boards carefully from their shipping cartons.  Follow the instructions on the carton.  Do NOT use the board handle to lift the board from the box.
    2, Install boards carefully into the system chassis.  Follow the procedure given in in the System Service Manual.

 


3800, 4810, 4800, 6800, 6900, serengeti, I2C
Previously Published As
80580


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback