Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1523406.1
Update Date:2018-02-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  1523406.1 :   Server can't boot with LSI RAID Controller error messages on a Sun Fire V440/V445 and Netra 440  


Related Items
  • Sun Fire V440 Server
  •  
  • Sun Netra 440 Server
  •  
  • Sun Fire V445 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-V4x0
  •  




In this Document
Symptoms
Cause
Solution
References


Applies to:

Sun Netra 440 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire V440 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire V445 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms



To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - SPARC Legacy Servers



This document describes how to manage LSI RAID Controller 'failed' messages when a Sun Fire V440/V445 and/or Sun Netra 440 can't boot from any of its Internal Disks.

Error Messages:
All these messages can be seen at the ok prompt (OBP Level). They usually appear at the end of POST, but that does not mean the SCSI bus has failed. Please read below.

Example 1:

Sun Fire V440, No Keyboard
Copyright 1998-2004 Sun Microsystems, Inc.  All rights reserved.
OpenBoot 4.13.0, 4096 MB memory installed, Serial #57067141.
Ethernet address 0:3:ba:66:c6:85, Host ID: 8366c685.

ERROR: LSI1030: timeout waiting for interrupt status.
ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.
ERROR: LSI1030: timeout waiting for interrupt status.
ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.

{1} ok probe-scsi-all
/pci@1f,700000/scsi@2,1
ERROR: LSI1030: timeout waiting for interrupt status.
ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.
ERROR: LSI1030: timeout waiting for interrupt status.
ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.
Can't open SCSI host adapter

Example 2:

{3} ok boot boot
ERROR: LSI1030: unknown error returned by IOC: scsi-status = 00, ioc-status = 004b
ERROR: LSI1030: execute-command: I/O timed out.

ERROR: LSI1030: unknown error returned by IOC: scsi-status = 00, ioc-status = 004b
ERROR: LSI1030: unknown error returned by IOC: scsi-status = 00, ioc-status = 004b
ERROR: LSI1030: unknown error returned by IOC: scsi-status = 00, ioc-status = 004b
Boot device: /pci@1f,700000/scsi@2/disk@1,0  File and args: boot
The file just loaded does not appear to be executable.
{3} ok probe-scsi-all
/pci@1f,700000/scsi@2,1

/pci@1f,700000/scsi@2
ERROR: LSI1030: unknown error returned by IOC: scsi-status = 00, ioc-status = 004b

Example 3:

Error: /pci@1f,700000/scsi@2,1 selftest failed, return code = 1
Testing /pci@1f,700000/scsi@2
ERROR: LSI1030: timeout waiting for interrupt status.

ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.

ERROR: LSI1030: timeout waiting for interrupt status.

ERROR: LSI1030 MPT Firmware, receive-message: issue-port-enable failed.


  ERROR   : 1030 scsi open failed.
  DEVICE  : /pci@1f,700000/scsi@2
  SUBTEST : selftest:unit-rdy-test
  MACHINE : Sun Fire V440
  SERIAL# : 53766073
  DATE    : 01/03/2013 18:18:09  GMT
  CONTR0LS: diag-level=max test-args=

Error: /pci@1f,700000/scsi@2 selftest failed, return code = 1


ERROR: OpenBoot Diagnostics failed.

ERROR: OpenBoot Diagnostics failed
WARNING: Device /pci@1f,700000/scsi@2,1 already marked with 'status' == fail
WARNING: Device /pci@1f,700000/scsi@2 already marked with 'status' == fail

SC Alert: /pci@1f,700000/scsi@2,1 fail

SC Alert: /pci@1f,700000/scsi@2 fail

 

Cause

These failure messages from the LSI RAID Controller can be caused by different issues:

1. SCSI bus is not seeing the disks.
2. One or more disks has failed, so the SCSI bus is full of errors and the SCSI Controller can not properly access any device.

3. SCSI Backplane and/or the SCSI Backplane Cable are failing, which will impact the SCSI bus causing it to not allow access to the Disks.
4. Finally, the least likely cause of failure, is the LSI RAID Controller integrated into the Motherboard may be failing.

All these causes will prevent boot up the server from any Internal Disk.

Solution

At OBP Level, follow the steps below in order to isolate the failure:

1. Reset the SCSI bus and check if the disks can be seen. Run the following commands at the ok prompt:

  1. ok reset-all
  2. ok probe-scsi-all

2. If the Disks can not be seen, attempt to isolate failure to one of the internal disks being failed.  Replace any failed internal disks.

  1. Visual verify if any fault lights on internal disks and replace if required.
  2. Boot from alternate Solaris media and attempt to test internal disks with:
    # format
    * Analyze
    * Read
  3. Replace any disks that do not respond or hang on the format/analyze/read command.


3. If unable to boot from alternate media, please do the following at OBP level:

  1. Un-seat all internal disks
  2. Change OBP parameter 'auto-boot?' for troubleshooting
    ok setenv auto-boot? false
  3. Reset the server
    ok reset-all
  4. Probe the scsi bus while you re-seat the Internal Disks one by one. After each Disk is re-seated, reset the system, then probe the scsi bus again
    ok probe-scsi-all
  5. Pay attention to output on step d, if one of the disks is slow in response this could be an indication of a bad disk. Remove disk and probe scsi bus again, replace suspect disk that is slow in responding to the probe scsi output.

4. At this point, contact Oracle Support for any of this options:

  1. Replace the faulty disk(s) isolated on previous steps.
  2. Get assistance solving the issue if no disks are found faulty.

 

Additional troubleshooting steps:
5. If no internal disks can be isolated as faulty, then replace SCSI Backplane Cable to rule out cable failure.

6. If previous steps do not correct errors, then replace SCSI Backplane.

7. If issues persists, proceed with Motherboard replacement.

 

Be aware that the different HW parts can be FRU or CRU, depending on the Server.

Additional Info:
How to Remove and Replace V440/Netra 440 Disk Drive:ATR:777:0 [Video] (Doc ID 1347685.1)
How to Replace a V445 Hard Disk Drive:ATR:1115:0 (Doc ID 1310124.1)
VSP Hardware RAID Training/ Knowledge (Doc ID 1621613.1)

V440 Server Parts Installation and Removal Guide and Admin Guide
http://download.oracle.com/docs/cd/E19088-01/v440.srvr/index.html

Netra 440 Server Service Manual and Server System Administration Guide
http://download.oracle.com/docs/cd/E19102-01/n440.srvr/index.html

Sun Fire V445 Server Service Manual
http://download.oracle.com/docs/cd/E19088-01/v445.srvr/index.html

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback