Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1321278.1
Update Date:2016-07-07
Keywords:

Solution Type  Troubleshooting Sure

Solution  1321278.1 :   Sun Enterprise[TM] 10000: Troubleshooting Domain Panics  


Related Items
  • Sun Enterprise 450 Server
  •  
  • Sun Enterprise 10000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  




In this Document
Purpose
Troubleshooting Steps
 Cannot allocate IOMMU TSB arrays
  Fast Data Access MMU Miss
  lock_set_spl: 70222069 lock held and only one CPU
  munged memory list
 Async data error at tl1
  Ecache SRAM Data Parity Error
  Ecache Writeback Data Parity Error
  UE Error: Ecache Copyout on CPUyy
  kstat_q_exit: qlen == 0


Applies to:

Sun Enterprise 10000 Server - Version All Versions and later
Sun Enterprise 450 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Purpose

This document provides troubleshooting information for various panics commonly seen on E10000 domains.

Troubleshooting Steps

Cannot allocate IOMMU TSB arrays

Symptom:

  
Boot device: /sbus@40,0/SUNW,qfe@0,8c00000 File and args:
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
Timeout waiting for ARP/RARP packet
23a00 X
Requesting Internet address for 0:0:be:a6:68:1b
Internet address is xxx.xxx.xx.xx = xxxxxxxx
hostname: chef
domainname: Lab.Sun.COM
root server: venus
root directory: /export/install/7/base.s998s_u3smccServer-11/Solaris_2.7/Tools/B
oot
Alloc of 0x2140000 bytes at 0x10b80000 refused.
panic[cpu0]/thread=10404040: Cannot allocate IOMMU TSB arrays
rebooting...
 

Resolution:

The system is trying to boot Solaris 7, but Solaris 2.6 is specified in the domain_config file. Correct the domain_config file.

 

 

Fast Data Access MMU Miss

Symptom:

  
KERNEL dropped into OBP due to following trap at trap level = 1
Fast Data Access MMU Miss
Normal Alternate MMU Vector
0: 0 0 0 0
1: 0 1ff00000000 ffe916d0 40
2: 0 0 f1000000 12800003808
3: 0 0 0 0
4: 0 1fff0008d1c 0 1ff00000000
5: 0 f0008d1c f 0
6: 0 0 fff00000 0
 

Resolution:

There are some likely possibilities:

1. inetboot file in /tftpboot on the SSP is incorrect.

2.  400/8MB processors involved and boot image does not have the latest kernel patch

3. dr-max-mem set too large or incorrectly on Solaris 2.5.1

4. A hardware problem. Run an hpost -l32 or hpost -l64 on the domain.    bringup -D on can also done.

 

 

lock_set_spl: 70222069 lock held and only one CPU

Symptom:

  
Rebooting with command: boot net -v
It took 741 milli seconds to do mailbox callback
Boot device: /sbus@44,0/SUNW,qfe@0,8c00000 File and args: -v
Using Onboard transceiver - Link Up.
2ee00
Server IP address: xxx.xx.xxx.xx
Client IP address: xxx.xx.xxx.xx
Using Onboard transceiver - Link Up.
hostname: lima
domainname: dom1.something.com
root server: beans-ssp
root directory: /cdrom/sol_2_6_598_sparc_smcc_svr/s0/Solaris_2.6/Tools/Boot
Size: 335983+72325+449939 Bytes
cpu0: SUNW,UltraSPARC (upaid 4 impl 0x11 ver 0xa0 clock 400 MHz)
cpu1: SUNW,UltraSPARC (upaid 5 impl 0x11 ver 0xa0 clock 400 MHz)
cpu2: SUNW,UltraSPARC (upaid 6 impl 0x11 ver 0xa0 clock 400 MHz)
cpu3: SUNW,UltraSPARC (upaid 7 impl 0x11 ver 0xa0 clock 400 MHz)
It took 916 milli seconds to do mailbox callback
SunOS Release 5.6 Version Generic_105181-05 [UNIX(R) System V Release 4.0]
Copyright (c) 1983-1997, Sun Microsystems, Inc.
Using default device instance data
mem = 4194304K (0x100000000)
avail mem = 4156407808
panic[cpu4]/thread=0x10404040: lock_set_spl: 70222069 lock held and only one CPU
rebooting...
BAD TRAP: cpu=4 type=0x31 rp=0x104035b8 addr=0x17 mmu_fsr=0x0
: trap type = 0x31
 

Resolution:

The CPUs in the domain being booted have an 8MB cache size, and the patch level of Solaris being booted is choking on this. Use the OBP command limit-ecache-size.

 

 

munged memory list

Symptom:

  
Boot device: /sbus@64,0/SUNW,hme@0,8c00000 File and args: - install
2ee00 hostname: foo
domainname: bag.com
root server: foobar
root directory: /export/install/sparc/os/2.6-598-419+/Solaris_2.6/Tools/Boot
panic[cpu38]/thread=0x10404000: munged memory list = 0x10403914
 

Resolution:

The system is trying to boot Solaris 2.6, but Solaris 2.5.1 is specified in the domain_config file. Correct the domain_config file.

 

Async data error at tl1

Symptom:

System panics with Async data error at tl1.

Resolution:

This is generally indicative of an E-cache parity error on a CPU. The SPARC Architecture Manual writes:

An asynchronous data error occurred on a data access. Examples: an ECC error occurred while writing data from a cache store buffer to memory, or an ECC error occurred on an MMU hardware table walk.

The panic string will report the failing CPU.

  
panic[cpu44]/thread=0x74714720
Async data error at tl1
 

Replace the CPU reported.

 

 

Ecache SRAM Data Parity Error

Ecache Writeback Data Parity Error

UE Error: Ecache Copyout on CPUyy

Symptom:

System panics with one of the following Ecache SRAM Data Parity Error Ecache Writeback Data Parity Error UE Error: Ecache Copyout on CPUyy

Resolution:

These are E-cache parity error panics caused by a CPU. Click here for details on which CPU needs replacement.

 

 

kstat_q_exit: qlen == 0

Symptom:

System panics with kstat_q_exit: qlen == 0.

Resolution:

Check if EMC disk is attached to the domain. It is possible for Solaris to overflow EMC's queues. EMC has a restriction on tag queue depth and suggests reducing the default sd throttle.

To reduce the throttle, add set sd:sd_max_throttle=20 in /etc/system.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback