![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Problem Resolution Sure Solution 1522934.1 : Troubleshooting an Unbootable Netra T5440 After Panic(s) resulting in "ERROR: 1 CPUs in MD did not start"
In this Document
Created from <SR 3-6646793652> Applies to:Sun Netra T5440 Server - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. Symptoms On : Netra T5440 panic: failed to stop cpu100
0x64 panic[cpu28]/thread=2a104c3dca0: xt_sync: timeout
panic[cpu0]/thread=180e000: cpu100 failed to start (2)
Cause The Issue can be seen and verified in the @persist@hostconsole.log file, gathered by the Snapshot utility of the resident ILOM installed on this machine: panic: failed to stop cpu100
0x64 panic[cpu28]/thread=2a104c3dca0: xt_sync: timeout 000002a104c3cfe0 unix:xt_sync+2e8 (2a104c3d148, c, ef0423242cdff, c, 187c600, 60) %l0-3: 000002a104c3d0c8 000ef0438d9280ff 000ef0423241ac7f 000ef0438d9280af %l4-7: 00000000010b0000 000002a104c3d128 0000000000000000 00000000010b0120 000002a104c3d1d0 unix:hat_unload_callback+824 (3000c3bc008, 2a104c3d420, 0, 0, 0, 30001a61bc0) %l0-3: 000003000c446000 0000000000000001 0000000000000001 000003000c4407ff %l4-7: 000002a104c3d530 ffffffffffffffff ffffffffffffffff 000007007e5cedc0 000002a104c3d590 swrand:physmem_ent_gen+210 (70085038, 700000ef480, 0, 0, 0, 1000) %l0-3: 0000000000008de9 0000000000000000 000002a104c3d68c 0000000000001fff %l4-7: 0000000000002000 0000000000001000 0000000011bd2000 000000000000000d 000002a104c3d6f0 swrand:rnd_handler+14 (0, 2a104c3dca0, 0, 0, 70085028, 70085000) %l0-3: ffffffffffffffff ffffffffffffffff 0000000000000063 000006009f715d48 %l4-7: 0000000000630000 0000000000000001 0000000000000000 000003000c840000 000002a104c3d7a0 genunix:callout_list_expire+5c (60095466fc0, 600954f2c00, 80000000, 0, bfffffffffffffff, 4000000000000000) %l0-3: 00000300cf92fc40 8000000000000000 000000000187c270 0000000000000008 %l4-7: 0000000000000002 0000000000000010 000006009f64b5c8 000000007bfc3e74 000002a104c3d850 genunix:callout_expire+1c (60095466fc0, 60095467040, 185ed90, 185ed90, 0, 0) %l0-3: 00000600954f2c00 0000000000000004 0000000000000016 000002a101b51ca0 %l4-7: 00000000158166a3 00000000018f4000 000003000c840178 000000003afed9b5 000002a104c3d900 genunix:callout_execute+c (60095466fc0, 6009f8b7a18, dbab91c, 0, f24f0d00, 0) %l0-3: 00000000018f5068 000003000c7702e0 0000000000000000 000006009f657338 %l4-7: 000002a1012f9ca0 0000000000000001 000000000182b1e0 000000000182b1d8 000002a104c3d9b0 genunix:taskq_thread+300 (6009f657370, 6009f657308, cd18b6d0235d6, cd18b9e7aa50c, 6009f65733c, 6009f65733a) %l0-3: 000006009f8b7a18 000006009f657338 0000000000000002 0000000000080000 %l4-7: 000006009f657328 0000000000010000 00000000fffeffff 000006009f657330 syncing file systems... done dumping to /dev/md/dsk/d10, offset 10309140480, content: kernel 1% done 2% done 3% done 4% done ~SNIP~ 99% done 100% done 100% done: 888929 pages dumped, compression ratio 4.29, dump succeeded rebooting... Resetting... ERROR: 1 CPUs in MD did not start {0} ok boot
Boot device: rootdisk File and args: SunOS Release 5.10 Version Generic_142900-14 64-bit Copyright 1983-2010 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. ~SNIP~ panic[cpu0]/thread=180e000: cpu100 failed to start (2) 000000000180b8b0 unix:start_cpu+140 (64, 10196a0, 1835400, 1000, 8, 1000000000) %l0-3: 0000000001835630 0000000000000000 0000000fffffffff 00000000010ad400 %l4-7: 0000000fffffffff 0000000000000001 0000000000000024 0000000001835628 000000000180b960 unix:start_other_cpus+1dc (1906400, 1, 0, 18634c8, 185fcd0, 186f748) %l0-3: 0000000000000064 0000000000000024 00000000010ad800 0000000000000000 %l4-7: 0000000001906400 00000000010ad800 0000000001019400 000000000182b000 000000000180ba10 genunix:main+1e4 (1901800, 2, 185ed90, 18f87e8, 0, 18f4000) %l0-3: 000000000180c000 0000000001900c00 0000000050ec713e 0000000001900c00 %l4-7: 0000000001866400 0000000000000001 0000000001900c00 0000000001901a68 syncing file systems... done skipping system dump - no dump device configured rebooting... Resetting... ERROR: 1 CPUs in MD did not start Netra T5440, No Keyboard Copyright (c) 1998, 2012, Oracle and/or its affiliates. All rights reserved. OpenBoot 4.33.6, 130848 MB memory available, Serial #1234WXYZ. Ethernet address 0:00:00:x0:xa:1x, Host ID: 9X9X9X9X9X. Aborting auto-boot sequence. {0} ok
The cause for this issue has been determined as a fault encountered on the Memory Mezzanine Board. ##### spos_logs/@var@log@plhwsvc.log #####
01/08/13 14:39:35: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU2 failed 01/08/13 14:39:35: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU3 failed 01/08/13 14:42:38: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP0/MCU2 failed 01/08/13 14:42:38: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP0/MCU3 failed 01/08/13 14:43:38: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU2 failed 01/08/13 14:43:38: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU3 failed 01/08/13 14:46:40: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP0/MCU2 failed 01/08/13 14:46:40: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP0/MCU3 failed 01/08/13 14:47:40: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU2 failed 01/08/13 14:47:40: plat_hwsvc_rpc_svc.c:2586:hwsvc_get_enable_disable_state_1_svc():hwsvc_get_enable_disable_state_1_svc: /SYS/MB/CMP1/MCU3 failed From above, Memory Control Units (MCU's) 2 and 3 are called out as "failed" from across both CMP0 and CMP1 processors. Turning then to the properties output of those components, the following additional details are shown:/SYS/MB/CMP0/MCU2 Properties:
type = Memory Controller component_state = (none) <--- should say either "Enabled" or "Disabled" /SYS/MB/CMP0/MCU3 Properties: type = Memory Controller component_state = (none) <--- and /SYS/MB/CMP1/MCU2 Properties: type = Memory Controller component_state = (none) <--- /SYS/MB/CMP1/MCU3 Properties: type = Memory Controller component_state = (none) <---
Verifying in the ... -> show -d properties -level all
SolutionUpon encountering this situation, and probable match of the above statements compared to the issue experienced, please engage the Oracle SPARC Hardware team for proper handling via issuing a new Service Request either online via the My Oracle Support (MOS) portal, or calling us at 1-800-223-1711, option 2 to open a New Service Request.
The following steps were provided for further onsite diagnosis by the field engineer:
Results from the field: Attachments This solution has no attachment |
||||||||||||||||
|