![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2340619.1 : SuperCluster: M8 IO domains may fail to boot with a "ERROR: Last Trap: Fast Data Access MMU Miss"
IO domains on SuperCluster M8 platform may fail to boot with a ERROR: Last Trap: Fast Data Access MMU Miss. This applies to M8 SuperCluster where Fortville cards (Part# 7319817 - Quad 10-Gigabit or Dual 40-Gigabit Ethernet QSFP+) are used. In this Document
Applies to:Oracle SuperCluster M8 Hardware - Version All Versions to All Versions [Release All Releases]Oracle Solaris on SPARC (64-bit) Symptoms IO domains deployed on M8 SuperCluster where NIC cards (Part# 7319817 - Quad 10-Gigabit or Dual 40-Gigabit Ethernet QSFP+) are used may fail to boot with the following error on the IO domain console: # telnet 0 5001
Trying 0.0.0.0... Connected to 0. Escape character is '^]'. Connecting to console "ssccn2-io-dbm02" in group "ssccn2-io-dbm02" .... Press ~? for control options .. NOTICE: Entering OpenBoot. NOTICE: Fetching Guest MD. NOTICE: Starting slave cpus. NOTICE: Initializing LDCs. NOTICE: Probing PCI devices. i40e_init_arq: Failed to write to Admin Rx Queue Regs ERROR: Last Trap: Fast Data Access MMU Miss
NOTE: This issue is only applicable with IO domains on M8 SuperClusters where NIC cards (Part# 7319817 - Quad 10-Gigabit or Dual 40-Gigabit Ethernet QSFP+) are used for client (10G) network. FCode 3.9.0 and below are susceptible to this issue.
The above issue can be encountered in any of the below scenarios: Scenario A: Starting IO domains in parallel using "ldm start -a" command after reboot of a root domain(s) The issue is ONLY seen when the system has 9 or more Virtual Functions (VF) consumed from a single Physical Function (PF) from a given Root Domain.
Example: (a) Identify the PF on the control or primary domain in a PDom (ex: ssccnX). In this example we am verify on ssccn3 (Primary LDom on PDom2) # ldm ls-io | grep IOVNET| egrep 'primary|ssccn.-dom.' (b) For each PF verify how many VF are created and consumed by IO domains # ldm ls-io | grep CMIOU3 | grep PF0 | grep IOVNET | grep VF | grep ssccn.- | wc -l Run Step (b) for all the PF in all CMIOUs. This needs to be verified across all the primary domains in the SuperCluster rack (ex: ssccn1-4)
NOTE: If the count is 9 or more than the system is susceptible to hit this scenario A.
Scenario B: Power-on of a PDom that has root domain(s) with IO domains deployed The issue is ONLY seen when the system has 4 or more Virtual Functions (VF) consumed from a single Physical Function (PF) from a given Root Domain. Refer to the above example to verify number of VFs consumed for each PF. NOTE: If the count is 4 or more than the system is susceptible to hit this scenario B.
CauseThe cause of the issue is being investigated under BUG 27133932 - i40e_init_arq: Failed to write to Admin Rx Queue Regs observed during ldm start Solution1. (Scenario A) If the issue is encountered while starting all IO domains in parallel after a reboot of root domain(s), then follow the below steps: a. Stop all IO domains in the root domain # ldm stop <IO-Domain>
b. Once all the IO domains are stopped, then start IO domains sequentially (one at a time). # ldm start <IO-Domain>
NOTE: Allow 5 secs delay before starting the next IO domain and proceed till all the IO domains are started.
2. (Scenario B) If the issue is encountered while "power-on" of a PDom where all the IO domains get started after power-on, then follow the below steps: a. Stop all IO domains in the PDom # ldm stop <IO-Domain>
b. Reboot all the root domain(s) in the PDom. The order of rebooting root domain doesn't matter. # reboot
c. Once the root domain is rebooted, login and stop all IO domains in the root domain # ldm stop <IO-Domain>
d. Once all the IO domains are stopped, then start IO domain sequentially (one at a time). # ldm start <IO-Domain>
NOTE: Allow 5 secs delay before starting the next IO domain and proceed till all the IO domains are started.
References<BUG:27133932> - I40E_INIT_ARQ: FAILED TO WRITE TO ADMIN RX QUEUE REGS OBSERVED DURING LDM STARTAttachments This solution has no attachment |
||||||||||||||||||
|