Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1930869.1
Update Date:2018-04-04
Keywords:

Solution Type  Problem Resolution Sure

Solution  1930869.1 :   GM / vbsc.log flooded with "DEBUG: failed to check for data on "ipmi" LDC because it reset"  


Related Items
  • SPARC T5-8
  •  
  • SPARC T5-4
  •  
  • SPARC T5-2
  •  
  • SPARC T4-2
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-9635121001>

Applies to:

SPARC T5-8 - Version All Versions and later
SPARC T4-2 - Version All Versions to All Versions [Release All Releases]
SPARC T5-4 - Version All Versions and later
SPARC T5-2 - Version All Versions and later
Information in this document applies to any platform.

Symptoms

In the initial case, the Service Processor firmware was upgraded from v8.5.0a to 8.5.1b using Patch #151296-02. When the Service Processor came back up after reset  the Host fails to boot. When downgrading the System Firmware back to v8.5.0a via the web interface, the problem persists. At this time the host system will not start. This can be solved at first power cycling the machine, but this will not provide a definite solution to the problem.

The T5 systems have hit this problem for another reason that may be due to firmware.

Cause

T4 systems:

ipmi LDC had been resetting often, and this prevented the system from working normally.
 
In the initial case, the event list shows that the customer tried upgrading to 3.2.1.9 and then they switched back to 3.2.1.8.a but the system halted anyway unexpectedly.

 ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out

1245 Mon Sep 22 14:07:32 2014 System Log minor Host: OpenBoot Running
1243 Mon Sep 22 14:06:51 2014 System Log minor Host: Solaris halting <<<<<<<<<1242 Mon Sep 22 13:48:59 2014 System Log minor Host: Solaris running <<<<<<<<1241 Mon Sep 22 13:48:53 2014 System Log minor Host: Solaris booting
1236 Mon Sep 22 13:45:22 2014 System Log minor Host: HV started
1235 Mon Sep 22 13:45:21 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1234 Mon Sep 22 13:35:42 2014 System Log minor Host: Powered On
1232 Mon Sep 22 13:35:40 2014 System Log minor power button has been pressed
1231 Mon Sep 22 13:35:40 2014 System Log minor System power on has been requested via power button.
1230 Mon Sep 22 13:10:57 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1229 Mon Sep 22 13:10:56 2014 System Log minor Host: Powered Off
1228 Sun Sep 21 17:30:18 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1227 Sun Sep 21 17:30:18 2014 System Log minor Host: Powered Off
1226 Sun Sep 21 17:26:42 2014 System Log critical SP is about to reboot
1225 Sun Sep 21 17:26:42 2014 System Log major upgrade to version 3.2.1.8.a succeeded <<<<<<<<<<<<<<<<<<<<<<<<
1224 Sun Sep 21 06:20:31 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1223 Sun Sep 21 06:20:31 2014 System Log minor Host: Powered Off
1222 Sun Sep 21 06:17:07 2014 Power Reset major /SP has been reset by: Web session
1221 Sun Sep 21 06:16:25 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1220 Sun Sep 21 06:16:24 2014 System Log minor Host: Powered Off
1219 Sun Sep 21 06:14:10 2014 Power Off major Power to /SYS has been turned off by: Web session
1218 Sun Sep 21 06:14:10 2014 System Log minor Host: Host shutting down
1217 Sun Sep 21 06:12:16 2014 System Log minor Host: Solaris running
1212 Sun Sep 21 06:08:40 2014 System Log minor Host: HV started
1211 Sun Sep 21 06:08:38 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1210 Sun Sep 21 06:06:15 2014 System Log minor Host: Powered On
1208 Sun Sep 21 06:06:10 2014 Power On major Power to /SYS has been turned on by: SP, Reason: Firmware update complete
1207 Sun Sep 21 06:06:05 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1206 Sun Sep 21 06:06:05 2014 System Log minor Host: Powered Off
1205 Sun Sep 21 06:02:32 2014 System Log major upgrade to version 3.2.1.9.b succeeded <<<<<<<<<<<<<<<<<<<<<<
1204 Sun Sep 21 06:02:32 2014 Power Reset major /SP has been reset by: SP, Reason: Firmware update complete
1203 Sun Sep 21 05:59:37 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1202 Sun Sep 21 05:59:36 2014 System Log minor Host: Powered Off
1201 Sun Sep 21 05:57:23 2014 Power Off major Power to /SYS has been turned off by: SP, Reason: Firmware update started
1200 Sun Sep 21 05:57:23 2014 System Log minor Host: Host shutting down

...
1183 Fri May 2 15:16:31 2014 System Log minor Host: Solaris running
1178 Fri May 2 15:14:58 2014 System Log minor Host: HV started
1177 Fri May 2 15:14:56 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1176 Fri May 2 15:12:33 2014 System Log minor Host: Powered On
1174 Fri May 2 15:12:30 2014 Power On major Power to /SYS has been turned on by: Web session, Username:root
1173 Fri May 2 15:08:11 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1172 Fri May 2 15:08:11 2014 System Log minor Host: Powered Off
1171 Fri May 2 15:04:37 2014 System Log critical SP is about to reboot
1170 Fri May 2 15:04:37 2014 System Log major upgrade to version 3.2.1.8.a succeeded <<<<<<<<<<<<<<<<<<<<<<<<<
1169 Fri May 2 14:36:45 2014 Chassis Action minor Inventory has been updated starting at node '/SYS/MB'
1168 Fri May 2 14:36:45 2014 System Log minor Host: Powered Off
1167 Fri May 2 14:33:26 2014 Power Reset major /SP has been reset by: Web session

* Repeated messages of ipmi LDC reset are being displayed so far (this is just an excerpt)

/cores/3-9635121001/tds-2014-09-22/ORACLESP-AK00099747_AK00099747_2014-09-22T16-25-17/ilom/@persist@vbsc@vbsc0.log

Sep 20 19:02:30 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:02:30 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:04:21 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:06:51 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:09:12 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:09:12 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:09:15 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 20 19:15:54 DEBUG: failed to read data from "ipmi" LDC because it reset

* However, these messages are still being displayed on the VBSC

/cores/3-9635121001/tds-2014-09-22/ORACLESP-AK00099747_AK00099747_2014-09-22T16-25-17/ilom/@persist@vbsc@vbsc0.log

Sep 22 16:01:05 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:07:44 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:07:46 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:07:50 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:07:51 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:12:03 DEBUG: failed to check for data on "ipmi" LDC because it reset
Sep 22 16:12:04 DEBUG: failed to read data from "ipmi" LDC because it reset

* The other components are reported OK according to the snapshot output:

##### ilom/@usr@local@bin@collect_properties.out #####

---------- FRU --------- - Part No - ----- PPart No ----- ----- Serial # ----- --- Mfg ---- ------ Product ----- Status
FB 7051522-01 464507N+1316HH00CE 9615 HON HAI FAN_BD OK
MB 7049060-02 465769T+1314TF0L3E Celestica Ho OK
P0/M0 7051516-01 465769T+1317TB0G64 Celestica Ho MEM_RISER OK
P0/M0/B0/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315156F11BE Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M0/B0/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315155F11B4 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M0/B1/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315158F118E Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M0/B1/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315158F11A0 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M1 7051516-01 465769T+1317TB0G3R Celestica Ho MEM_RISER OK
P0/M1/B0/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315155F1190 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M1/B0/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315158F11A4 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M1/B1/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315156F119C Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P0/M1/B1/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315151F119B Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M0 7051516-01 465769T+1317TB0G4H Celestica Ho MEM_RISER OK
P1/M0/B0/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315156F11C6 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M0/B0/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315155F11BE Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M0/B1/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315151F11B5 Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M0/B1/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315158F119C Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M1 7051516-01 465769T+1317TB0G54 Celestica Ho MEM_RISER OK
P1/M1/B0/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315151F11BF Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M1/B0/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315154F119F Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M1/B1/C0/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315156F11BC Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
P1/M1/B1/C1/D0 7042208-01 HMT31GR7CFR4A-PB 00AD011315157F11BC Hynix Semico 8192MB DDR3 SDRAM DI Enabled OK
MB/SP 7054434-01 4A003EH+1320XX0281 Celestica Ho OK
SASBP 511-1246-53 0226LHF-1304A901C3 9615 HON HAI DISK_BP OK
 

=================================================

T5 Systems:

Engineering is investigating this issue, but appears to be benign.

Solution

T4 systems:
The communication between the Service Processor and the hypervisor (no matter which version it has) is not optimal.  The Action Plan recommended is the following:

1) Reseat the Service Processor (especially if a newly delivered system).
2) Replace the Service Processor: How to Replace a SPARC T4-2 or Netra T4-2 Service Processor:ATR:1415592.1:0 (Doc ID 1415592.1)
3) Monitor the system for a couple of days
4) If unstable, send a fresh snapshot for further analysis.

T5 systems:

This problem is resolved by firmware 9.4.2.b or newer (Nyx 1.5.x).
Related to bug 19339165.

References

<NOTE:1662204.1> - SPARC T5,M5&M6: SPSUN4V-8000-DE: fault.memory.dq: Memory Fault

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback