Asset ID: |
1-72-1465634.1 |
Update Date: | 2017-10-11 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1465634.1
:
M3000/M4000/M5000 - OBP probe-all command execution and subsequent boot causes "Last Trap: Data Access Error" and MBU_A or IOU is degraded
Related Items |
- Sun SPARC Enterprise M4000 Server
- Sun SPARC Enterprise M5000 Server
- Sun SPARC Enterprise M3000 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
|
M3000/M4000/M5000 - OBP probe-all command execution and subsequent boot causes "Last Trap: Data Access Error" and MBU_A or IOU is degraded
In this Document
Applies to:
Sun SPARC Enterprise M3000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M4000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases]
All Platforms
Symptoms
When issuing 'probe-all' instead of a "probe-scsi-all" command in OBP of M3000/M4000/M5000, the subsequent boot causes a "Last Trap: Data Access Error"
This also Degrades the MBU on an M3000 or an IOU on the M4000/M5000 with ereport.chassis.SPARC-Enterprise.asic.pci_sw.fe
The customer or field may interpret this as a HW error and would replace MBU_A or IOU
The M8000 and M9000 does not have this issue
Changes
Running OBP command probe-all causes this issue
Cause
{0} ok probe-all
{0} ok boot cdrom
Boot device: /pci@0,600000/pci@0/pci@0/scsi@0/disk@4,0:f File and args:
ERROR: /pci@0,600000/pci@0/pci@0/scsi@0: Last Trap: Data Access Error
%TL:1 %TT:32 %TPC:f0038b1c %TnPC:f0038b20 %TSTATE:446a001600
%PSTATE:16 ( IE:1 PRIV:1 PEF:1 )
DSFSR:4280804b ( FV:1 OW:1 PR:1 E:1 TM:1 ASI:80 NC:1 BERR:1 )
DSFAR:fdb16000 DSFPAR:401080100000 D-TAG:0
OR
{0} ok probe-all
{0} ok boot
ERROR: /pci@0,600000/pci@0/pci@0/scsi@0: Last Trap: Data Access Error
%TL:1 %TT:32 %TPC:f0038b1c %TnPC:f0038b20 %TSTATE:446a001600
%PSTATE:16 ( IE:1 PRIV:1 PEF:1 )
DSFSR:4280804b ( FV:1 OW:1 PR:1 E:1 TM:1 ASI:80 NC:1 BERR:1 )
DSFAR:fdb18000 DSFPAR:401080100000 D-TAG:0
XSCF> showstatus
* MBU_A Status:Degraded;
OR
XSCF> showstatus
* IOU#0 Status:Degraded;
XSCF> fmdump -e
TIME CLASS
Jun 08 08:59:43.5452 ereport.chassis.SPARC-Enterprise.asic.pci_sw.fe
For an M3000
XSCF> showlogs error -v -r -M
Date: Jun 08 10:28:31 SGT 2012 Code: 60002500-b9010000-0300000700320000
Status: Warning Occurred: Jun 08 10:28:29.030 SGT 2012
FRU: /MBU_A
Msg: MBU error occurred (TT=0x32)
Diagnostic Code:
00000000 00000000 00000000
54543d30 78333200 00000000 00000000
00000000 00000000 00000000 00000000
UUID: 3055c68e-fc39-4ddf-a199-1e52f442ce1e MSG-ID: SCF-8001-U5
For M4000/M5000
XSCF> showlogs error -v -r -M
Date: Mar 25 09:01:29 SGT 2013 Code: 60002500-b9010000-0300000700320000
Status: Warning Occurred: Mar 25 09:01:27.111 SGT 2013
FRU: /IOU#0
Msg: IOU error occurred (TT=0x32)
Diagnostic Code:
00000000 00000000 00000000
54543d30 78333200 00000000 00000000
00000000 00000000 00000000 00000000
UUID: 3a12a033-03cb-4960-8a93-1c951e69d733 MSG-ID: SCF-8001-U5
Solution
1. Do Not replace Hardware due to this issue
2. Execute clearfault to remove the fault from the system. See Instructions Below
- Note that clearfault was ONLY introduced in non-service mode in Firmware Version 1115 and above
- if you are on Firmware levels below 1115, you will not see the clearfault command
- if you do not already have the latest Firmware, please perform a Firmware Upgrade Prior to clearfault (as of this writing, the latest Firmware is 1119)
Procedures to clear the fault
1. From XSCF, poweroff and poweron the affected domain, if you do not do this step, the fault will only be cleared after Circuit Breaker OFF/ON (by removing power cords)
2. With Firmware 1115 and above, run clearfault on /MBU_A or /IOU#0 or /IOU#1
3. Subsequent boot will work, but when probe-all is issued again, the fault re-appears
** using Domain-0 as an example here, substitute your domain number accordingly
XSCF> poweroff -d0 -y
DomainIDs to power off:00
Continue? [y|n] :y
00 :Powering off
XSCF> poweron -d0
DomainIDs to power on:00
Continue? [y|n] : y
00 :Powering on
Example: M3000
XSCF> clearfault MBU_A
XSCF> showstatus
No failures found in System Initialization.
XSCF>
Example: M4000/M5000
XSCF> clearfault /IOU#0
XSCF> showstatus
No failures found in System Initialization.
XSCF>
References
<NOTE:1007101.1> - Sun SPARC(R)Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers: Fault clearing and LEDs behavior
Attachments
This solution has no attachment