Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1604929.1
Update Date:2017-08-16
Keywords:

Solution Type  Problem Resolution Sure

Solution  1604929.1 :   PCIEX-8000-MH causing AMBER LED On, on T5-x Systems  


Related Items
  • SPARC T5-4
  •  
  • SPARC T5-2
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8084265801>

Applies to:

SPARC T5-2 - Version All Versions to All Versions [Release All Releases]
SPARC T5-4 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

PCIEX-8000-MH observed from network port, flagging all components faulted with different severity.

Cause

FMA event would look like this:

=======================================================================

'fmdump -e' says, repeatedly 'ereport.io.servive.degraded' and then 'ereport.io.service.restored'. i.e.

<explorer-output>/fma $more fmdump-e.out

:

:

Nov 10 17:42:23.3151 ereport.io.service.degraded
Nov 10 17:42:23.9557 ereport.io.service.restored

==========

checking verbose output of fmdump on errlog says:


<explorer-output>/fma/var/fm/fmd $fmdump -V -t 11/10/2013 -T 11/11/2013 errlog
TIME                           CLASS
Nov 10 2013 07:42:23.315125300 ereport.io.service.degraded
nvlist version: 0
       class = ereport.io.service.degraded
       ena = 0x7afcf19a75a00801
       detector = (embedded nvlist)
       nvlist version: 0
               version = 0x0
               scheme = dev
               device-path = /pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1
       (end detector)

       __ttl = 0x1
       __tod = 0x527f38df 0x12c86e34

Nov 10 2013 07:42:23.955726041 ereport.io.service.restored
nvlist version: 0
       class = ereport.io.service.restored
       ena = 0x7aff547fd0321c01
       detector = (embedded nvlist)
       nvlist version: 0
               version = 0x0
               scheme = dev
               device-path = /pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1
       (end detector)

       __ttl = 0x1
       __tod = 0x527f38df 0x38f738d9

=======================

faults log in snapshot FMA would say:

<ilom-snapshot>/fma $more @persist@faultdiags@faults.log

:

:

2013-11-16/12:30:42  767966d3-3d23-6b9e-c75f-e1a128679899   PCIEX-8000-MH
   list_sz = 5

   fault[0] = fault.io.pciex.device-interr-unaf               <<<<<<<< too many correctable events associated with the device..
       certainty = 22.0 %
       FRU       = /SYS/MB
       ASRU      = dev:////pci@300/pci@1/pci@0/pci@4/pci@0/pci@8/network@0,1               <<<<<<<<<<<<< this is network controller chip on RIO (in T5-4).
       RESOURCE  = hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=4/pc
       fru_part_number = xxx
       fru_serial_number = xxx
       chassis_serial_number = xxx

   fault[1] = fault.io.pciex.device-interr-unaf
       certainty = 22.0 %
       FRU       = /SYS/MB
       ASRU      = dev:////pci@300/pci@1/pci@0/pci@4/pci@0
       RESOURCE  = hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=4/pc
       fru_part_number = xxx
       fru_serial_number = xxx
       chassis_serial_number = xxx

   fault[2] = fault.io.pciex.device-interr-unaf
       certainty = 22.0 %
       FRU       = /SYS/MB
       ASRU      = dev:////pci@300/pci@1/pci@0/pci@4
       RESOURCE  = hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=4/pc
       fru_part_number = xxx
       fru_serial_number = xxx
       chassis_serial_number = xxx

   fault[3] = fault.io.pciex.device-interr-unaf
       certainty = 22.0 %
       FRU       = /SYS/MB
       ASRU      = dev:////pci@300/pci@1/pci@0
       RESOURCE  = hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=1/pciexdev=0/pciexfn=0
       fru_part_number = xxx
       fru_serial_number = xxx
       chassis_serial_number = xxx

   fault[4] = fault.io.pciex.device-interr-unaf
       certainty = 11.0 %
       FRU       = /SYS/PM0
       ASRU      = dev:////pci@300/pci@1
       RESOURCE  = hc:///chassis=0/motherboard=0/cpuboard=0/chip=0/hostbridge=0/pciexrc=0
       fru_part_number = xxx
       fru_serial_number = xxx
       chassis_serial_number = xxx

===================================

Apparently it would look like either network port issue, or driver issue.

In this SR, We replaced RIO which contains network controller, however it did not help. This could be possible driver issue as described in document 1951204.1, which is published later.

=======================================================================

Investigating further, check if LDOMs are configured and condition listed in document 1593243.1 are met or what.

 

Here in this case, it is matching the condition. i.e.

<explorer-output>/sysconfig $grep extended-mapin ldm_list_-l.out
  extended-mapin-space=on
  extended-mapin-space=off <<<<<<<<<<<< this should be on, as per document 1593243.1
  extended-mapin-space=on
  extended-mapin-space=on
  extended-mapin-space=on
  extended-mapin-space=on
  extended-mapin-space=on
 

 

Solution

1)Reconfigure respective domain (in this example, it is lasun318) with 'extended-mapin-space=on', as per document 1593243.1

2) clear faults from Solaris FMA and ILOM FMA (document 1483194.1)

3) if problem persists, then log a service request with Oracle Support, to investigate this further.
 

References

<NOTE:1593243.1> - Solaris 10 and 11 Virtual Network Switch Can Corrupt TCP Packets Or Hang Interface When 'extended-mapin-space' is Off
<NOTE:1021334.1> - PCIEX-8000-MH - PCIEX subsystem problem
<NOTE:1951204.1> - Solaris 10, Solaris 11, and ZFS Storage Appliance Software (ZFSSA) Using the ixgbe(7D) Driver may Experience a NIC chip Reset and Report FMA Errors

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback