![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1321278.1 : Sun Enterprise[TM] 10000: Troubleshooting Domain Panics
In this Document
Applies to:Sun Enterprise 10000 Server - Version All Versions and laterSun Enterprise 450 Server - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. PurposeThis document provides troubleshooting information for various panics commonly seen on E10000 domains. Troubleshooting StepsCannot allocate IOMMU TSB arraysSymptom:
Boot device: /sbus@40,0/SUNW,qfe@0,8c00000 File and args:
Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet Timeout waiting for ARP/RARP packet 23a00 X Requesting Internet address for 0:0:be:a6:68:1b Internet address is xxx.xxx.xx.xx = xxxxxxxx hostname: chef domainname: Lab.Sun.COM root server: venus root directory: /export/install/7/base.s998s_u3smccServer-11/Solaris_2.7/Tools/B oot Alloc of 0x2140000 bytes at 0x10b80000 refused. panic[cpu0]/thread=10404040: Cannot allocate IOMMU TSB arrays rebooting... Resolution: The system is trying to boot Solaris 7, but Solaris 2.6 is specified in the domain_config file. Correct the domain_config file.
Fast Data Access MMU MissSymptom:
KERNEL dropped into OBP due to following trap at trap level = 1
Fast Data Access MMU Miss Normal Alternate MMU Vector 0: 0 0 0 0 1: 0 1ff00000000 ffe916d0 40 2: 0 0 f1000000 12800003808 3: 0 0 0 0 4: 0 1fff0008d1c 0 1ff00000000 5: 0 f0008d1c f 0 6: 0 0 fff00000 0 Resolution: There are some likely possibilities: 1. inetboot file in /tftpboot on the SSP is incorrect. 2. 400/8MB processors involved and boot image does not have the latest kernel patch 3. dr-max-mem set too large or incorrectly on Solaris 2.5.1 4. A hardware problem. Run an hpost -l32 or hpost -l64 on the domain. bringup -D on can also done.
lock_set_spl: 70222069 lock held and only one CPUSymptom:
Rebooting with command: boot net -v
It took 741 milli seconds to do mailbox callback Boot device: /sbus@44,0/SUNW,qfe@0,8c00000 File and args: -v Using Onboard transceiver - Link Up. 2ee00 Server IP address: xxx.xx.xxx.xx Client IP address: xxx.xx.xxx.xx Using Onboard transceiver - Link Up. hostname: lima domainname: dom1.something.com root server: beans-ssp root directory: /cdrom/sol_2_6_598_sparc_smcc_svr/s0/Solaris_2.6/Tools/Boot Size: 335983+72325+449939 Bytes cpu0: SUNW,UltraSPARC (upaid 4 impl 0x11 ver 0xa0 clock 400 MHz) cpu1: SUNW,UltraSPARC (upaid 5 impl 0x11 ver 0xa0 clock 400 MHz) cpu2: SUNW,UltraSPARC (upaid 6 impl 0x11 ver 0xa0 clock 400 MHz) cpu3: SUNW,UltraSPARC (upaid 7 impl 0x11 ver 0xa0 clock 400 MHz) It took 916 milli seconds to do mailbox callback SunOS Release 5.6 Version Generic_105181-05 [UNIX(R) System V Release 4.0] Copyright (c) 1983-1997, Sun Microsystems, Inc. Using default device instance data mem = 4194304K (0x100000000) avail mem = 4156407808 panic[cpu4]/thread=0x10404040: lock_set_spl: 70222069 lock held and only one CPU rebooting... BAD TRAP: cpu=4 type=0x31 rp=0x104035b8 addr=0x17 mmu_fsr=0x0 : trap type = 0x31 Resolution: The CPUs in the domain being booted have an 8MB cache size, and the patch level of Solaris being booted is choking on this. Use the OBP command limit-ecache-size.
munged memory listSymptom:
Boot device: /sbus@64,0/SUNW,hme@0,8c00000 File and args: - install
2ee00 hostname: foo domainname: bag.com root server: foobar root directory: /export/install/sparc/os/2.6-598-419+/Solaris_2.6/Tools/Boot panic[cpu38]/thread=0x10404000: munged memory list = 0x10403914 Resolution: The system is trying to boot Solaris 2.6, but Solaris 2.5.1 is specified in the domain_config file. Correct the domain_config file.
Async data error at tl1Symptom: System panics with Async data error at tl1. Resolution: This is generally indicative of an E-cache parity error on a CPU. The SPARC Architecture Manual writes: An asynchronous data error occurred on a data access. Examples: an ECC error occurred while writing data from a cache store buffer to memory, or an ECC error occurred on an MMU hardware table walk. The panic string will report the failing CPU.
Replace the CPU reported.
Ecache SRAM Data Parity ErrorEcache Writeback Data Parity ErrorUE Error: Ecache Copyout on CPUyySymptom: System panics with one of the following Ecache SRAM Data Parity Error Ecache Writeback Data Parity Error UE Error: Ecache Copyout on CPUyy Resolution: These are E-cache parity error panics caused by a CPU. Click here for details on which CPU needs replacement.
kstat_q_exit: qlen == 0Symptom: System panics with kstat_q_exit: qlen == 0. Resolution: Check if EMC disk is attached to the domain. It is possible for Solaris to overflow EMC's queues. EMC has a restriction on tag queue depth and suggests reducing the default sd throttle. To reduce the throttle, add set sd:sd_max_throttle=20 in /etc/system. Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||
|