Asset ID: |
1-72-1994249.1 |
Update Date: | 2017-07-12 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1994249.1
:
Faulty IO Device with msg ID: PCIEX-8000-DJ
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
|
In this Document
Created from <SR 3-10437421121>
Applies to:
SPARC T4-2 - Version All Versions and later
Information in this document applies to any platform.
A problem has been detected on one of the specified devices or on one of the specified connecting buses.
Symptoms
Customer reported: Faulty IO device
IO device has shows in fault state
From -> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB
/SP/faultmgmt/0/ | class | fault.io.pciex.device-noresp
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | PCIEX-8000-DJ
Changes
Fault is found on ILOM (Service Processor) .
After reviewing the Solaris OS (through explorer) there's no FMA fault found at host level (Solaris FMA), so probably this fault was cleared and an action was taking while server was being build.
Cause
A problem has been detected on one of the specified devices or on one of the specified connecting buses.
Fault is found on ILOM (Service Processor) and it is from 2015-03-09/15:44:50 (nine days ago).
After reviewing the Solaris OS (through explorer) there's no FMA fault found at host level (Solaris FMA), so probably this fault was cleared and an action was taking while server was being build.
From -> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/MB
/SP/faultmgmt/0/ | class | fault.io.pciex.device-noresp
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | PCIEX-8000-DJ
faults/0 | |
/SP/faultmgmt/0/ | component | /HOST
faults/0 | |
/SP/faultmgmt/0/ | uuid | 485a3164-0313-49e0-9821-f0ae1df31d
faults/0 | | f5
/SP/faultmgmt/0/ | timestamp | 2015-03-09/15:44:50
faults/0 | |
/SP/faultmgmt/0/ | system_serial_number | 1301BDY7C3
faults/0 | |
/SP/faultmgmt/0/ | system_part_number | 31407124+1+1
faults/0 | |
/SP/faultmgmt/0/ | system_name | SPARC T4-2
faults/0 | |
From fmadm faulty -a:
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 16 14:45:50 3344c462-f93b-eea0-9824-9103c8cac9a2 SMF-8000-YX major
Problem Status : resolved
Diag Engine : fmd / 1.2
System
Manufacturer : unknown
Name : ORCL,SPARC-T4-2
Part_Number : unknown
Serial_Number : 1301BDY7C3
Host_ID : 860fa460
----------------------------------------
Suspect 1 of 1 :
Fault class : defect.sunos.smf.svc.maintenance
Certainty : 100%
Affects : svc:///application/management/hwmgmtd:default
Description : A service failed - a start, stop or refresh method failed.
Response : The service has been placed into the maintenance state.
Impact : svc:/application/management/hwmgmtd:default is unavailable.
Action : Run 'svcs -xv svc:/application/management/hwmgmtd:default' to
determine the generic reason why the service failed, the location
of any logfiles, and a list of other services impacted. Please
refer to the associated reference document at
http://support.oracle.com/msg/SMF-8000-YX for the latest service
procedures and policies regarding this diagnosis.
From fmdump:
TIME UUID SUNW-MSG-ID EVENT
Mar 16 14:45:50.7733 3344c462-f93b-eea0-9824-9103c8cac9a2 SMF-8000-YX Diagnosed
Mar 16 14:46:00.1217 3344c462-f93b-eea0-9824-9103c8cac9a2 FMD-8000-4M Repaired
Mar 16 14:46:00.1326 3344c462-f93b-eea0-9824-9103c8cac9a2 FMD-8000-6U Resolved
Solution
fault was cleared after updating the FW and required device drivers. No faults reported form the SP or from the OS FMA
Verify if missing and required device drivers, patches, system FW are update. Clear faults, and reset the SP.
Verify after resetting the SP if faults are still being reported. If so, If a plug-in card is involved check for badly-seated cards or bent pins.
Otherwise schedule a repair procedure to replace the affected device(s)
replace component that's is being listed. (IE, PCIEX#, etc)
ACTION PLAN:
Please clear this faults from ILOM as per below action plan:
1. To gain access to ILOM fault management shell:
at "->" prompt execute: start /SP/faultmgmt/shell
2. To run fmadm faulty to check current faults:
fmadm faulty
3. For every UUID entry run:
fmadm repair <uuid>
Example:
fmadm repair 485a3164-0313-49e0-9821-f0ae1df31df5
3. After clearing faults found execute fmadm faulty to verify all faults are cleared at ILOM Fault management shell.
4. Type "exit" to return to ILOM.
5. Finally check "show faulty" command to inspect if there's any hardware reported at fault, it should be clear.
6. If not cleared then Manually clears PSH-detected faults using clear_fault_action property of the of the set command.
set <FRU> clear_fault_action=true
Example:
-> set /SYS/MB clear_fault_action=true
Reference: How To Clear FMA faults from Solaris[TM] and SC (System Controller) on T1000/T2000 T5120/T5220/T5140/T5240/T5440, T3-1/T3-2/T3-4, T4-1/T4-2/T4-4 (Doc ID 1004229.1)
References
<NOTE:1021316.1> - PCIEX-8000-DJ - PCIEX subsystem problem
<NOTE:1004229.1> - How To Clear FMA faults from Solaris[TM] and SC (System Controller) on T1000/T2000 T5120/T5220/T5140/T5240/T5440, T3-1/T3-2/T3-4, T4-1/T4-2/T4-4
Attachments
This solution has no attachment