Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1566083.1
Update Date:2018-03-06
Keywords:

Solution Type  Technical Instruction Sure

Solution  1566083.1 :   Boot failure due to "Multibit ECC Errors Were Detected On The RAID Controller" reported on Sparc T3-1, Sparc T3-2, Sparc T4-1, Sparc T4-2 and Netra  


Related Items
  • SPARC T3-1
  •  
  • SPARC T4-2
  •  
  • Netra SPARC T4-2 Server
  •  
  • SPARC T3-2
  •  
  • SPARC T4-1
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  




In this Document
Goal
Solution
References


Created from <SR 3-7453483351>

Applies to:

SPARC T4-2 - Version All Versions and later
SPARC T3-1 - Version All Versions and later
SPARC T3-2 - Version All Versions and later
SPARC T4-1 - Version All Versions and later
Netra SPARC T4-2 Server - Version All Versions and later
Information in this document applies to any platform.

Goal

Identify root cause for System Boot failure with error message "Multibit ECC errors were detected on the RAID Controller"

Solution

The System may report error messages similar to the following and abort auto-boot sequence:

"Multibit ECC errors were detected on the RAID controller.
 The DIMM on the controller needs replacement.
 Please contact technical support to resolve this issue.
 If you continue, data corruption can occur.
 Press 'X' to continue or else power off the system and replace the DIMM module and reboot. If you have replaced the DIMM press 'X' to continue

 F/W is in fault state
 ERROR: /pci@400/pci@2/pci@0/pci@c/LSI,mrsas@0: Last Trap Fast Data Access MMU Miss"

and depending on the installed FW package, other reported error messages are:

"Fatal        Multi-bit ECC error ..."
"Warning    Single-bit ECC error ..."
"Critical    Single-bit ECC error: ..."


The error messages shown above points to a faulty DIMM/memory module on the RAID HostBusAdapter in a PCIE slot. The DIMM/memory module is not a replaceable unit and therefore the faulty RAID HostBusAdapter in a PCIE slot needs to be replaced.

Use  Document 1005907.1 "Matrix of Recognized Device Paths for SPARC systems" to identify the PCIE slot.

In the above error message the path is /pci@400/pci@2/pci@0/pci@c which is PCIE slot 3 in either Sparc T3-1 or T4-1.

 

The type of the faulty PCI HBA can be identified using Document 1373995.1, the "PCI Card Identification List".

Searching for LSI,mrsas in Document 1373995.1 identifies this HBA as 

LSI,mrsas-pciex1000,79                   LSI,2108_4 SGX-SAS6-R-INT-Z       Niwot-Int      375-3701 Internal 8 port RAID controller 

 

After the RAID HostBusAdapter has been replaced, it is required to import the disk configuration (the NVRAM on RAID HostBusAdapter stores the firmware and configuration data and the new RAID HostBusAdapter contains an empty configuration). To import the configuration:

- MegaPCLI SAS RAID Management Tool (Fcode) : "cli -CfgForeign -Import"
- MegaCLI SAS RAID Management Tool  (CLI)   : "<path of MegaRAID>/MegaCli -CfgForeign -Import"
- MegaRAID Storage Management Tool  (GUI)   : "<path of MegaRaidStorageManager>/startupui.sh, Go To > Controller > Import Foreign Configuration"

and depending on the situation (status of the system and HBA replacement) one of above options can be used. For more information about the tools:

- MegaPCLI SAS RAID Management Tool (Fcode) : "cli -h"
- MegaCLI SAS RAID Management Tool  (CLI)   : "<path of MegaRAID>/MegaCli -h
- MegaRAID Storage Management Tool  (GUI)   : "<path of MegaRaidStorageManager>/startupui.sh, Go To > Help"

 

Example:

  • Select the HBA:

    {0} ok select /pci@400/pci@2/pci@0/pci@c/LSI,mrsas@0

        If foreign configuration(s) are found, the following message will be displayed:

        Foreign configuration(s) found on adapter.
        Press any key to continue or 'C' load the configuration utility, or 'F' to import foreign configuration(s) and continue.

        and once 'F' has been pressed, the foreign configuration(s) will be imported and once 'C' has been pressed the foreign configuration(s) needs to be imported manually (see below)

  • Display HBA logical and physical drive information

  {0} ok cli -LdPdInfo

        Adapter #0
        Number of Virtual Disks: 1
        Virtual Disk: 0 (Target Id: 0) 

  • Scan for a foreign configuration on the disks

          {0} ok cli -CfgForeign Scan

                There are 1 foreign configuration(s) on controller 0.
                Exit Code = 0x0

  • Look for a foreign configuration on the disks

    {0} ok cli -CfgForeign -Dsply

          There are 1 foreign configuration(s) on controller 0.
          Exit Code = 0x0

  • Import a foreign configuration

    {0} ok cli -CfgForeign -Import
  • Display HBA logical and physical drive information

          {0} ok cli -LdPdInfo

                Adapter #0
                Number of Virtual Disks: 1
                Virtual Disk: 0 (Target Id: 0)

  • Display all HBA information (look at the Error Counters for any Correctable/Uncorrectable Memory Errors)

          {0} ok cli -AdpAllInfo

                Adapter #0

  • Unselect the HBA

    {0} ok unselect-dev

  • Run probe-scsi-all to display the imported disks / volumes

   {0} ok setenv auto-boot? false

   {0} ok reset-all

   {0} ok probe-scsi-all

References

<NOTE:1005907.1> - SPARC Platforms: Matrix of Recognized Device Paths
<NOTE:1373995.1> - PCI Card Identification List
<NOTE:1397311.1> - Sun Storage 6Gb SAS PCIe RAID HBA Internal Card (Niwot) Home Page

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback