Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1586795.1
Update Date:2018-03-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1586795.1 :   Emlxs ERROR: 530: Mailbox Timeout. (HEARTBEAT: Nowait.)  


Related Items
  • SPARC T4-1B
  •  
  • Emulex FC HBA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7801723481>

Applies to:

Emulex FC HBA - Version Not Applicable to Not Applicable [Release N/A]
SPARC T4-1B - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Symptoms

This is a Solaris 10 SunBlade 6000 T4-1B server t4server1
with two Oracle 8Gb PCI-E Dual FC / Dual Gigabit Ethernet EM cards, LPem12002E-S , PN 371-4666



None of the FC ports are connected to the SAN or FC switches:


bash-4.1$ more fcinfo.out
HBA Port WWN: 10000000c9dfce10
  OS Device Name: /dev/cfg/c2
  Manufacturer: Emulex
  Model: LPem12002E-S
  Firmware Version: 2.00a5 (U3D2.00A5)
  FCode/BIOS Version: Boot:5.12a2 Fcode:3.10a3
  Serial Number: 0999VM0-12330024S3
  Driver Name: emlxs
  Driver Version: 2.60k (2011.03.24.16.45)
  Type: unknown
  State: offline
  Supported Speeds: 2Gb 4Gb 8Gb
  Current Speed: not established
  Node WWN: 20000000c9dfce10
HBA Port WWN: 10000000c9dfce11
  OS Device Name: /dev/cfg/c3
  Manufacturer: Emulex
  Model: LPem12002E-S
  Firmware Version: 2.00a5 (U3D2.00A5)
  FCode/BIOS Version: Boot:5.12a2 Fcode:3.10a3
  Serial Number: 0999VM0-12330024S3
  Driver Name: emlxs
  Driver Version: 2.60k (2011.03.24.16.45)
  Type: unknown
  State: offline
  Supported Speeds: 2Gb 4Gb 8Gb
  Current Speed: not established
  Node WWN: 20000000c9dfce11
HBA Port WWN: 10000000c9dfb4a0
  OS Device Name: /dev/cfg/c7
  Manufacturer: Emulex
  Model: LPem12002E-S
  Firmware Version: 2.00a5 (U3D2.00A5)
  FCode/BIOS Version: Boot:5.12a2 Fcode:3.10a3
  Serial Number: 0999VM0-123300250W
  Driver Name: emlxs
  Driver Version: 2.60k (2011.03.24.16.45)
  Type: unknown
  State: offline
  Supported Speeds: 2Gb 4Gb 8Gb
  Current Speed: not established
  Node WWN: 20000000c9dfb4a0
  Link Error Statistics:
  Link Failure Count: 0
  Loss of Sync Count: 0
  Loss of Signal Count: 1
  Primitive Seq Protocol Error Count: 0
  Invalid Tx Word Count: 0
  Invalid CRC Count: 0
HBA Port WWN: 10000000c9dfb4a1
  OS Device Name: /dev/cfg/c8
  Manufacturer: Emulex
  Model: LPem12002E-S
  Firmware Version: 2.00a5 (U3D2.00A5)
  FCode/BIOS Version: Boot:5.12a2 Fcode:3.10a3
  Serial Number: 0999VM0-123300250W
  Driver Name: emlxs
  Driver Version: 2.60k (2011.03.24.16.45)
  Type: unknown
  State: offline
  Supported Speeds: 2Gb 4Gb 8Gb
  Current Speed: not established
  Node WWN: 20000000c9dfb4a1
  Link Error Statistics:
  Link Failure Count: 0
  Loss of Sync Count: 0
  Loss of Signal Count: 1
  Primitive Seq Protocol Error Count: 0
  Invalid Tx Word Count: 0
  Invalid CRC Count: 0
bash-4.1$



--> With NO other explanation one HBA is shutdown on "Aug 23 08:42:36" based on the messages files , and FMA placed it as a faulty device:

Aug 23 08:42:36 t4server1 emlxs: [ID 349649 kern.info] [13.1714]emlxs0: ERROR: 530: Mailbox timeout. (HEARTBEAT: Nowait.)
Aug 23 08:42:39 t4server1 emlxs: [ID 349649 kern.info] [ 6.0889]emlxs0:WARNING: 231: Adapter shutdown. (Reboot required.)
Aug 23 08:42:45 t4server1 emlxs: [ID 349649 kern.info] [13.1714]emlxs1: ERROR: 530: Mailbox timeout. (HEARTBEAT: Nowait.)
Aug 23 08:42:48 t4server1 emlxs: [ID 349649 kern.info] [ 6.0889]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.)
Aug 23 08:43:06 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0 (emlxs0) down
Aug 23 08:43:06 t4server1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical
Aug 23 08:43:06 t4server1 EVENT-TIME: Friday, 23 August 2013 08:43:06 BST
Aug 23 08:43:06 t4server1 PLATFORM: ORCL,SPARC-T4-1B, CSN: -, HOSTNAME: t4server1
Aug 23 08:43:06 t4server1 SOURCE: eft, REV: 1.16
Aug 23 08:43:06 t4server1 EVENT-ID: c09160a5-d272-ef27-efc5-e4a69388f81e
Aug 23 08:43:06 t4server1 DESC: A problem was detected for a PCIEX device.
Aug 23 08:43:06 t4server1 Refer to http://sun.com/msg/PCIEX-8000-0A for more information.
Aug 23 08:43:06 t4server1 AUTO-RESPONSE: One or more device instances may be disabled
Aug 23 08:43:06 t4server1 IMPACT: Loss of services provided by the device instances associated with this fault
Aug 23 08:43:06 t4server1 REC-ACTION: Schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.
Aug 23 08:43:07 t4server1 SC Alert: [ID 805402 daemon.alert] Fault | critical: Fault detected at time = Fri Aug 23 07:43:06 2013. The suspect component: /SYS/MB/PCI-EM0 has fault.io.pciex.device-interr with probability=100. Refer to http://support.oracle.com/msg/PCIEX-8000-0A for details.
Aug 23 08:43:09 t4server1 emlxs: [ID 349649 kern.info] [13.0812]emlxs0: ERROR: 240: Adapter reset failed. (Timeout: status=0x400400)
Aug 23 08:43:09 t4server1 emlxs: [ID 349649 kern.info] [13.00FB]emlxs0: ERROR: 201: Adapter initialization failed. (Unable to init hba.)
Aug 23 08:43:09 t4server1 emlxs: [ID 349649 kern.info] [ 5.067D]emlxs0: ERROR: 201: Adapter initialization failed. (status=5)
Aug 23 08:43:15 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0 (emlxs0) down
Aug 23 08:43:15 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0,1 (emlxs1) down
Aug 23 08:43:15 t4server1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical
Aug 23 08:43:15 t4server1 EVENT-TIME: Friday, 23 August 2013 08:43:15 BST
Aug 23 08:43:15 t4server1 PLATFORM: ORCL,SPARC-T4-1B, CSN: -, HOSTNAME: t4server1
Aug 23 08:43:15 t4server1 SOURCE: eft, REV: 1.16
Aug 23 08:43:15 t4server1 EVENT-ID: 5ccdc277-6469-4240-921d-d0ba1ea1c9a6
Aug 23 08:43:15 t4server1 DESC: A problem was detected for a PCIEX device.
Aug 23 08:43:15 t4server1 Refer to http://sun.com/msg/PCIEX-8000-0A for more information.
Aug 23 08:43:15 t4server1 AUTO-RESPONSE: One or more device instances may be disabled
Aug 23 08:43:15 t4server1 IMPACT: Loss of services provided by the device instances associated with this fault
Aug 23 08:43:15 t4server1 REC-ACTION: Schedule a repair procedure to replace the affected device. Use fmadm faulty to identify the device or contact Sun for support.
Aug 23 08:43:15 t4server1 SC Alert: [ID 387653 daemon.alert] Fault | critical: Fault detected at time = Fri Aug 23 07:43:15 2013. The suspect component: /SYS/MB/PCI-EM0 has fault.io.pciex.device-interr with probability=100. Refer to http://support.oracle.com/msg/PCIEX-8000-0A for details.
Aug 23 08:43:18 t4server1 emlxs: [ID 349649 kern.info] [13.0812]emlxs1: ERROR: 240: Adapter reset failed. (Timeout: status=0x400400)
Aug 23 08:43:18 t4server1 emlxs: [ID 349649 kern.info] [13.00FB]emlxs1: ERROR: 201: Adapter initialization failed. (Unable to init hba.)
Aug 23 08:43:18 t4server1 emlxs: [ID 349649 kern.info] [ 5.067D]emlxs1: ERROR: 201: Adapter initialization failed. (status=5)
Aug 23 08:43:39 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0,1 (emlxs1) down
Aug 23 08:43:42 t4server1 emlxs: [ID 349649 kern.info] [13.0812]emlxs0: ERROR: 240: Adapter reset failed. (Timeout: status=0x400400)
Aug 23 08:43:42 t4server1 emlxs: [ID 349649 kern.info] [13.00FB]emlxs0: ERROR: 201: Adapter initialization failed. (Unable to init hba.)
Aug 23 08:43:42 t4server1 emlxs: [ID 349649 kern.info] [ 5.067D]emlxs0: ERROR: 201: Adapter initialization failed. (status=5)
Aug 23 08:43:48 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0 (emlxs0) down
Aug 23 08:43:51 t4server1 emlxs: [ID 349649 kern.info] [13.0812]emlxs1: ERROR: 240: Adapter reset failed. (Timeout: status=0x400400)
Aug 23 08:43:51 t4server1 emlxs: [ID 349649 kern.info] [13.00FB]emlxs1: ERROR: 201: Adapter initialization failed. (Unable to init hba.)
Aug 23 08:43:51 t4server1 emlxs: [ID 349649 kern.info] [ 5.067D]emlxs1: ERROR: 201: Adapter initialization failed. (status=5)
Aug 23 08:44:12 t4server1 genunix: [ID 408114 kern.info] /pci@400/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,emlxs@0,1 (emlxs1) down


The same FMA event was logged on the SP of the T4-1B:

------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2013-08-23/07:43:06 c09160a5-d272-ef27-efc5-e4a69388f81e PCIEX-8000-0A Critical

Fault class : fault.io.pciex.device-interr

FRU : /SYS/MB/PCI-EM0
  (Part Number: unknown)
  (Serial Number: unknown)

Description : A fault has been diagnosed by the Host Operating System.

Response : The service required LED on the chassis and on the affected
  FRU may be illuminated.

Impact : No SP impact. Check the Host OS for more information.

Action : The administrator should review the fault on the Host OS.
  Please refer to the Details section of the Knowledge Article
  for additional information.

------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2013-08-23/07:43:15 5ccdc277-6469-4240-921d-d0ba1ea1c9a6 PCIEX-8000-0A Critical

Fault class : fault.io.pciex.device-interr

FRU : /SYS/MB/PCI-EM0
  (Part Number: unknown)
  (Serial Number: unknown)

Description : A fault has been diagnosed by the Host Operating System.

Response : The service required LED on the chassis and on the affected
  FRU may be illuminated.

Impact : No SP impact. Check the Host OS for more information.

Action : The administrator should review the fault on the Host OS.
  Please refer to the Details section of the Knowledge Article
  for additional information.

 

 

Cause

1) The error "ERROR: 530: Mailbox timeout. (HEARTBEAT: Nowait.)" could be due to a firmware issue, ie:

Bug 15825903 : SUNBT7207068 SUNWVTS EXTERNAL LOOPBACK TEST ON GANYMEDE EM WITH/WITHOUT CABLES C
--> Fix in firmware drop 1.1.27.0

Bug 16908133 : EMLXSX: ERROR: 530: MAILBOX TIMEOUT FOLLOWED BY FAULT WHEN REMOVING LOOPBACKS
--> Fix delivered in 1.1.43.8 firmware.

*** Bug 17331148 : REPEATED PCIEX-8000-0A REPORTED EVEN AFTER PCIEX HBA HAS BEEN REPLACED TWICE --> fixed in 149173-04

 

2) By the other hand, this error could be due to a HW problem on the FC HBA, the following bug was opened for a customer with patch 149173-04 and Emulex confirmed there was a HW problem on the FC HBA.
Bug 19856029 - Emulex FC HBA ERROR: 240: Adapter reset failed. (Timeout: status=0x400400)


Another customer had Sol10 emlxs patch 144188 -02 installed, then installed emlxs patch 149173-03 and clear the fma errors, reboot the server, but this did not solve the issue.

Failed FC HBA, based on customer testings :
--------------------
After investigation it is the card which requires replacement,

I re-seated the card, moved the card from slot 0.0 to 0.1 and back, also removed the cables, each time starting from a power off situation.

Each time the fault cleared during POST then failed during a probe-scsi-all, with the message 'can't open SCSI host adapter' and the card LEDs went off.
--------------------
 

3) New recent Bug was open related with 16GB FC HBAs, and a fix was provided by Emulex based on new firmware 11.1.218.17

Bug 25408165 : ERROR: 530: Mailbox timeout.

 

Solution

1) For 16GB FC HBAs, upgrade FC HBA firmware to 11.1.218.17  (file A11.1.218.17.grp) from Avago web site

 

If Avago link is not working , then take it from
http://hba-eng.us.oracle.com/FTP/pub/fc/ganymede/e/SW5.1-RC3/Firmware/

 

2) Before replacing the FC HBA , try this first:

1. Clear FMA errors associated

 

2. On Solaris 10 : Install emlxs patch 149173-04 (or greater)

On Solaris 11: Upgrade to Oracle Solaris 11.1.16.5.0 (or greater)

See:

Solaris[TM] 11 (not for Solaris[TM] 10 or before) Oracle Fibre Channel (FC) HBA Driver and Firmware Matrix (Doc ID 1514218.1)

Solaris[TM] 10 SPARC (not x86) Oracle Fibre Channel (FC) HBA Driver and Firmware Matrix (Doc ID 1311825.1)

Solaris[TM] 10 x86 (not SPARC) Oracle Fibre Channel (FC) HBA Driver and Firmware Matrix (Doc ID 1311817.1)

 

3. Reboot the server and then upgrade the firmware on the HBA ports as explained on this document:

Warning:1540: Firmware Update Required. (A Manual Hba Reset Or Link Reset (Using Luxadm Or Fcadm) Is Required [Doc ID 1356876.1]

 

4. If that does not solve the issue, FC HBA will need to be replaced, you can also visually inspect the LED status to confirm current status:

How to Interpret Oracle Fibre Channel (FC) HBA Port LED Patterns (Doc ID 1399644.1)
 

References

<NOTE:1918685.1> - Fibre Channel (FC) HBA card messages "Adapter reset failed." "Adapter initialization failed."
<BUG:19856029> - EMULEX FC HBA ERROR: 240: ADAPTER RESET FAILED.
<NOTE:1021223.1> - PCIEX-8000-0A - PCIEX subsystem problem
<NOTE:1356876.1> - Firmware Update Required. (A Manual Hba Reset Or Link Reset (Using Luxadm Or Fcadm) Is Required
<BUG:17331148> - REPEATED PCIEX-8000-0A REPORTED EVEN AFTER PCIEX HBA HAS BEEN REPLACED TWICE.
<BUG:25408165> - ERROR: 530: MAILBOX TIMEOUT.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback