Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1994249.1
Update Date:2017-07-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  1994249.1 :   Faulty IO Device with msg ID: PCIEX-8000-DJ  


Related Items
  • SPARC T4-2
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-10437421121>

Applies to:

SPARC T4-2 - Version All Versions and later
Information in this document applies to any platform.
A problem has been detected on one of the specified devices or on one of the specified connecting buses.

Symptoms

Customer reported: Faulty IO device
IO device has shows in fault state


From -> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB
/SP/faultmgmt/0/ | class | fault.io.pciex.device-noresp
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | PCIEX-8000-DJ

Changes

Fault is found on ILOM (Service Processor) .
After reviewing the Solaris OS (through explorer) there's no FMA fault found at host level (Solaris FMA), so probably this fault was cleared and an action was taking while server was being build.

Cause

A problem has been detected on one of the specified devices or on one of the specified connecting buses.

Fault is found on ILOM (Service Processor) and it is from 2015-03-09/15:44:50 (nine days ago).
After reviewing the Solaris OS (through explorer) there's no FMA fault found at host level (Solaris FMA), so probably this fault was cleared and an action was taking while server was being build.
 
From -> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB
/SP/faultmgmt/0/ | class | fault.io.pciex.device-noresp
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | PCIEX-8000-DJ
faults/0 | |
/SP/faultmgmt/0/ | component | /HOST
faults/0 | |
/SP/faultmgmt/0/ | uuid | 485a3164-0313-49e0-9821-f0ae1df31d
faults/0 | | f5
/SP/faultmgmt/0/ | timestamp | 2015-03-09/15:44:50
faults/0 | |
/SP/faultmgmt/0/ | system_serial_number | 1301BDY7C3
faults/0 | |
/SP/faultmgmt/0/ | system_part_number | 31407124+1+1
faults/0 | |
/SP/faultmgmt/0/ | system_name | SPARC T4-2
faults/0 | |

From fmadm faulty -a:
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 16 14:45:50 3344c462-f93b-eea0-9824-9103c8cac9a2 SMF-8000-YX major

Problem Status : resolved
Diag Engine : fmd / 1.2
System
  Manufacturer : unknown
  Name : ORCL,SPARC-T4-2
  Part_Number : unknown
  Serial_Number : 1301BDY7C3
  Host_ID : 860fa460

----------------------------------------
Suspect 1 of 1 :
  Fault class : defect.sunos.smf.svc.maintenance
  Certainty : 100%
  Affects : svc:///application/management/hwmgmtd:default

Description : A service failed - a start, stop or refresh method failed.

Response : The service has been placed into the maintenance state.

Impact : svc:/application/management/hwmgmtd:default is unavailable.

Action : Run 'svcs -xv svc:/application/management/hwmgmtd:default' to
  determine the generic reason why the service failed, the location
  of any logfiles, and a list of other services impacted. Please
  refer to the associated reference document at
  http://support.oracle.com/msg/SMF-8000-YX for the latest service
  procedures and policies regarding this diagnosis.

From fmdump:
TIME UUID SUNW-MSG-ID EVENT
Mar 16 14:45:50.7733 3344c462-f93b-eea0-9824-9103c8cac9a2 SMF-8000-YX Diagnosed
Mar 16 14:46:00.1217 3344c462-f93b-eea0-9824-9103c8cac9a2 FMD-8000-4M Repaired
Mar 16 14:46:00.1326 3344c462-f93b-eea0-9824-9103c8cac9a2 FMD-8000-6U Resolved
 

Solution

fault was cleared after updating the FW and required device drivers. No faults reported form the SP or from the OS FMA

Verify if missing and required device drivers, patches, system FW are update. Clear faults, and reset the SP.
Verify after resetting the SP if faults are still being reported. If so,
If a plug-in card is involved check for badly-seated cards or bent pins.

Otherwise schedule a repair procedure to replace the affected device(s)

replace component that's is being listed. (IE, PCIEX#, etc)

 

ACTION PLAN:

Please clear this faults from ILOM as per below action plan:

1. To gain access to ILOM fault management shell:

at "->" prompt execute: start /SP/faultmgmt/shell

2. To run fmadm faulty to check current faults:

fmadm faulty

3. For every UUID entry run:

fmadm repair <uuid>

Example:

fmadm repair 485a3164-0313-49e0-9821-f0ae1df31df5

3. After clearing faults found execute fmadm faulty to verify all faults are cleared at ILOM Fault management shell.

4. Type "exit" to return to ILOM.

5. Finally check "show faulty" command to inspect if there's any hardware reported at fault, it should be clear.

6. If not cleared then Manually clears PSH-detected faults using clear_fault_action property of the of the set command.

set <FRU> clear_fault_action=true

Example:
-> set /SYS/MB clear_fault_action=true

Reference: How To Clear FMA faults from Solaris[TM] and SC (System Controller) on T1000/T2000 T5120/T5220/T5140/T5240/T5440, T3-1/T3-2/T3-4, T4-1/T4-2/T4-4 (Doc ID 1004229.1)
 

References

<NOTE:1021316.1> - PCIEX-8000-DJ - PCIEX subsystem problem
<NOTE:1004229.1> - How To Clear FMA faults from Solaris[TM] and SC (System Controller) on T1000/T2000 T5120/T5220/T5140/T5240/T5440, T3-1/T3-2/T3-4, T4-1/T4-2/T4-4

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback