![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2079496.1 : FC HBA PCI Card Not Detected by Solaris After T5-4 Firmware Upgrade
In this Document
Created from <SR 3-11689059251> Applies to:SPARC T5-4 - Version All Versions and laterQlogic FC HBA - Version All Versions and later Sun Storage 8Gb FC PCIe HBA, 2 Port, Emulex - Version All Versions and later Information in this document applies to any platform. SymptomsSolaris 11.1 T5-4 control domain with 4 FC HBAs installed on PCI slot 5, 9, 11 and 13 , and assigned to other guest domains, but after T5-4 firmware upgrade , pci card on slot 5 is not visible on prtdiag: ======================================== IO Devices =======================================
Slot + Bus Name + Model Max Speed Cur Speed Status Type Path /Width /Width ------------------------------------------------------------------------------------------- /SYS/RCSA/PCIE9 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 2.5GT/x8 /pci@380/pci@1/pci@0/pci@a/SUNW,assigned-device@0 /SYS/RCSA/PCIE9 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 2.5GT/x8 /pci@380/pci@1/pci@0/pci@a/SUNW,assigned-device@0,1 /SYS/RCSA/PCIE11 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 2.5GT/x8 /pci@3c0/pci@1/pci@0/pci@e/SUNW,assigned-device@0 /SYS/RCSA/PCIE11 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 2.5GT/x8 /pci@3c0/pci@1/pci@0/pci@e/SUNW,assigned-device@0,1 /SYS/RCSA/PCIE13 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 5.0GT/x4 /pci@480/pci@1/pci@0/pci@a/SUNW,assigned-device@0 /SYS/RCSA/PCIE13 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GT/x8 5.0GT/x4 /pci@480/pci@1/pci@0/pci@a/SUNW,assigned-device@0,1
A fma fault was generated in relation to PCIE5 --------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Nov 11 09:37:16 490c94d7-013f-6b85-f62d-98d2992d8c96 ILOM-8000-1G Major Problem Status : repaired Diag Engine : fdd / 1.0 System Manufacturer : Oracle Corporation Name : SPARC T5-4 Part_Number : 31930909+8+1 Serial_Number : AK00117922 Host_ID : 862dd3a4 ---------------------------------------- Suspect 1 of 1 : Fault class : fault.fruid.replay Certainty : 100% Affects : /chassis=0/rcsa=0/pcie=5/car=0 FRU Location : "/SYS/RCSA/PCIE5/CAR" Manufacturer : Oracle Corporation Name : TLA,CAR,T5-4,T5-8 Part_Number : 7069814 Revision : 01 Serial_Number : 465769T+1311U20L1L Chassis Manufacturer : Oracle Corporation Name : SPARC T5-4 Part_Number : 31930909+8+1 Serial_Number : AK00117922 Status : removed Description : A Field Replaceable Unit (FRU) in the chassis contains records to indicate it is faulty. Response : The service-required LED may be illuminated on the affected FRU and chassis. Impact : The system may not be able to use one or more components on the affected FRU. Action : Please refer to the associated reference document at http://support.oracle.com/msg/ILOM-8000-1G for the latest service procedures and policies regarding this diagnosis. The Solaris Virtualization software recognize the card as UNK bash-4.1$ more ldm_ls-io.out
NAME TYPE BUS DOMAIN STATUS ---- ---- --- ------ ------ pci_0 BUS pci_0 primary pci_1 BUS pci_1 primary pci_2 BUS pci_2 primary IOV pci_3 BUS pci_3 primary IOV pci_4 BUS pci_4 primary IOV pci_5 BUS pci_5 primary pci_6 BUS pci_6 primary IOV pci_7 BUS pci_7 primary /SYS/RCSA/PCIE1 PCIE pci_0 primary EMP /SYS/RCSA/PCIE2 PCIE pci_0 primary EMP /SYS/MB/SASHBA0 PCIE pci_0 primary OCC /SYS/RIO/NET0 PCIE pci_0 primary OCC /SYS/RCSA/PCIE3 PCIE pci_1 primary EMP /SYS/RCSA/PCIE4 PCIE pci_1 primary EMP /SYS/RCSA/PCIE9 PCIE pci_2 nodeB OCC /SYS/RCSA/PCIE10 PCIE pci_2 primary EMP /SYS/RCSA/PCIE11 PCIE pci_3 nodeA OCC /SYS/RCSA/PCIE12 PCIE pci_3 primary EMP /SYS/RCSA/PCIE5 PCIE pci_4 nodeA UNK <<---- /SYS/RCSA/PCIE6 PCIE pci_4 primary EMP /SYS/RCSA/PCIE7 PCIE pci_5 primary OCC /SYS/RCSA/PCIE8 PCIE pci_5 primary EMP /SYS/RCSA/PCIE13 PCIE pci_6 nodeB OCC /SYS/RCSA/PCIE14 PCIE pci_6 primary EMP /SYS/RCSA/PCIE15 PCIE pci_7 primary EMP /SYS/RCSA/PCIE16 PCIE pci_7 primary EMP /SYS/MB/SASHBA1 PCIE pci_7 primary OCC /SYS/RIO/NET2 PCIE pci_7 primary OCC
lrwxrwxrwx 1 root root 64 Aug 29 2013 c3 -> ../../devices/pci@400/pci@1/pci@0/pci@e/SUNW,emlxs@0,1/fp@0,0:fc
A snapshot of the T5-4 shows: /SYS/RCSA/PCIE5
Properties: type = Slot requested_config_state = Enabled <<-------- should be enabled on next post ( reset/SYS ) current_config_state = Disabled <<-------WHY?? disable_reason = Configuration Rules /SYS/RCSA/PCIE5/CAR Properties: type = PCIe Hot Plug Carrier fru_description = TLA,CAR,T5-4,T5-8 fru_manufacturer = Oracle Corporation fru_part_number = 7069814 <<------------- PCI carrier correct fru_rev_level = 01 fru_serial_number = 465769T+1311U20L1L fault_state = OK clear_fault_action = (none) /SYS/RCSA/PCIE5/CAR/CARD Properties: type = PCIE Module fault_state = OK clear_fault_action = (none)
We can see also the card type in PCIE5 : /System/PCI_Devices/On-board/Device_5
Properties: description = 8-port SAS Controller location = SASHBA1 (SAS Controller 1) /System/PCI_Devices/Add-on Properties: /System/PCI_Devices/Add-on/Device_5 Properties: part_number = SG-XPCIE2FC-EM8-Z description = Sun StorageTek Dual 8 Gb Fibre Channel PCIe HBA, Emulex location = PCIE5 (PCIE 5) pci_vendor_id = 0x10df pci_device_id = 0xfc40 pci_subvendor_id = 0x10df pci_subdevice_id = 0xfc42
The link to the PCI on post was ok : 2:0 | /SYS/RCSA/PCIE5/CAR | 815001a00000 | 26:00:0 | 10df | fc40 | 03 | fc42 | 10df | 8 G1 After clearing fma faults on ILOM , FMA shell and Solaris, and power cycling the T5-4 (ILOM and Solaris reset) , the problem persist. CauseThere is an old fault (it can be found on the faultDB data collected by snapshot) from 2013 that needs to be removed: <root revision="99" version="1.0" qualifier="pod">
<FAULT_LIST VAL="/SYS/RCSA/PCIE5/CAR:930eb93e-8f22-c0c3-a383-f6e1ee722d17:0"><FRU>/SYS/RCSA/PCIE5/CAR</FRU><Time_Stamp>2013-10-02/12:11:05</Tim e_Stamp><UUID>930eb93e-8f22-c0c3-a383-f6e1ee722d17</UUID><MSGID>PCIEX-8000-0A</MSGID><CLASS>fault.io.pciex.device-interr</CLASS><RESOURCE>/SYS/ RCSA/PCIE5/CAR/CARD</RESOURCE><ASRU>/SYS/RCSA/PCIE5/CAR/CARD</ASRU><LOCATION>/SYS/RCSA/PCIE5/CAR</LOCATION><PERCENT>100</PERCENT><STATE>268</ST ATE><RAR_Timestamp>0</RAR_Timestamp><NV0>_cid=152584</NV0><NV1>_list_sz=1</NV1><NV2>_list_idx=0</NV2><NV3>system_component_serial_number=AK0011 7922</NV3><NV4>system_component_part_number=31930909+8+1</NV4><NV5>system_component_name=SPARC T5-4</NV5><NV6>system_component_manufacturer=Ora cle Corporation</NV6><NV7>chassis_serial_number=AK00117922</NV7><NV8>chassis_part_number=31930909+8+1</NV8><NV9>chassis_name=SPARC T5-4</NV9><N V10>chassis_manufacturer=Oracle Corporation</NV10><NV11>system_serial_number=AK00117922</NV11><NV12>system_part_number=31930909+8+1</NV12><NV13 >system_name=SPARC T5-4</NV13><NV14>system_manufacturer=Oracle Corporation</NV14><NV15>fru_name=TLA,CAR,T5-4,T5-8</NV15><NV16>fru_manufacturer= Oracle Corporation</NV16><NV17>fru_serial_number=465769T+1311U20L1L</NV17><NV18>fru_rev_level=01</NV18><NV19>fru_part_number=7069814</NV19><NV2 0>mod-version=1.16</NV20><NV21>mod-name=eft</NV21><NV22>severity=Critical</NV22></FAULT_LIST></root> This internal document explains the issue: This can happen with any PCI card on any T5 server platform: T5-2, T5-4, T5-8 and T5-1B
SolutionContact with T5-4 server team that will confirm this issue on the snapshot and will provide a escalation password and action plan to clear this fault. The action plan requires to power cycle the T5-4 server (completely power off and then power on)
Note. In the case the system is configured with LDoms , please, engage with Oracle Support Solaris Virtualization team, as new steps may be required In the case of systems configured with LDoms boot factory-default and confirm the devices are now visible to to Solaris. Re-init the configuration from a backup to recreate the spconfig without the device disable flag being set. For factory-default simply confirm the devices are now visible to the OS.
As explained on internal document: Disabled PCIe devices due to stale faultDB and deconfigDB entries w/SysFW 9.3.0.x (Doc ID 1999520.1) Once confirmed there are no actionable faults and /etc/devices/retire_store is clear, the faultDB and deconfigDB should be purged via the escalation shell.
References<NOTE:1999520.1> - Disabled PCIe devices due to stale faultDB and deconfigDB entries w/SysFW 9.3.0.x<NOTE:1332409.1> - How to Repair FMA Module Errors Seen in 'fmadm faulty' <NOTE:1483194.1> - Commands to run to fully clear ILOM/SP, faultmgmt shell, and FMA faults on the T3-x and T4-x Servers Attachments This solution has no attachment |
||||||||||||||||||
|