![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2329333.1 : Sun SPARC Enterprise Mx000 Server: How to Troubleshoot XSCFU WDT event SCF-8005-NE due to Error "Recovery of wbuf failed due to a second write error"
In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:Sun SPARC Enterprise M9000-64 Server - Version All Versions to All Versions [Release All Releases]Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise M3000 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise M4000 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise M8000 Server - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. SymptomsThe goal of this document is to help understanding situations where XSCFU reports process down with the following signature reported in the Linux messages file (spos_log/*messages*): ECC failures may return during readback from the writebuffer ( Second Write errors) XSCFU may run into hung state or just reboot and recovers. showlogs error -v will report like this Date: Sep 21 14:13:49 EST 2017 Code: 40000000-faffc201-011d000200000000
Status: Information Occurred: Sep 21 14:13:48.700 EST 2017 FRU: /FIRMWARE,/XSCFU Msg: XSCF process down detected Diagnostic Code: 00000000 00000000 00000000 66666666 2e736364 622e3230 31373039 00000000 00000000 00000000 00000000 UUID: 9521c18b-3686-4d7f-bf94-c62b744d86f2 MSG-ID: SCF-8005-NE FMA reports below signatures XSCF> fmdump -v
TIME UUID MSG-ID Sep 21 14:06:47.8206 fa4ada6a-29fc-4e9c-b851-1213fa94f3dd SCF-8005-NE 100% defect.chassis.software Problem in: hc:///chassis=0/xcp=0 The XSCF monitor log ('showlogs monitor') will contain an informational message similar to this: Sep 21 14:13:58 ##Hostname## Information: /FIRMWARE,/XSCFU:SCF:XSCF process down detected
The following signature will be reported in the Linux messages file (spos_log/*messages*): Sep 21 19:01:47 (none) kernel: JFFS2:1506020443.308518:scf_init(106):[06 /hcpc/tmp]:mtd->read(0x1facc bytes from 0x4ca0534) returned ECC error
Sep 21 19:01:47 (none) kernel: JFFS2:1506020452.820616:jffs2_gcd_mtd6(185):[06 /hcpc/tmp]:mtd->read(0x232 bytes from 0x4cb57ac) returned ECC error Sep 21 19:01:05 (none) portmap: portmap startup succeeded Sep 21 19:01:47 (none) kernel: JFFS2:1506020493.470935:jffs2_gcd_mtd6(185):[06 /hcpc/tmp]:mtd->read(0x1000 bytes from 0x4ca5064) returned ECC error Sep 21 19:01:47 (none) kernel: JFFS2:1506020493.476174:jffs2_gcd_mtd6(185):[06 /hcpc/tmp]:mtd->read(0x1000 bytes from 0x4ca0840) returned ECC error Sep 21 19:01:47 (none) kernel: JFFS2:1506020506.662663:exe(338):[06 /hcpc/tmp]:mtd->read(0x44 bytes from 0x4ca4f7c) returned ECC error Starting pid 360, console /dev/console: '/scf/init/scf_stop' Sep 21 19:02:39 (none) exiting on signal 15 CauseCheck the Linux messages file (spos_logs/@var@log@messages*) and dmesg file (spos_logs/@scf@bin@*dmesg*) for some ECC errors Sep 17 10:28:02 ##Hostname## kernel: JFFS2:1505662082.875260:dbs(379):[06 /hcpc/tmp]:Data CRC 67713f1a != calculated CRC 1fcd8730 for node at 04cb0a68
Sep 17 10:28:03 ##Hostname## kernel: JFFS2:1505662083.028197:dbs(379):[06 /hcpc/tmp]:Data CRC 67713f1a != calculated CRC 1fcd8730 for node at 04cb0a68 Sep 17 10:28:03 ##Hostname## kernel: JFFS2:1505662083.182062:dbs(379):[06 /hcpc/tmp]:Data CRC 67713f1a != calculated CRC 1fcd8730 for node at 04cb0a68 Sep 17 10:28:03 ##Hostname## kernel: JFFS2:1505662083.334375:dbs(379):[06 /hcpc/tmp]:Data CRC 67713f1a != calculated CRC 1fcd8730 for node at 04cb0a68 Sep 17 10:26:30 ##Hostname## kernel: Recovery of wbuf failed due to a second write error
Sep 17 10:26:30 ##Hostname## kernel: Write of 1381 bytes at 0x04cb6db0 failed. returned -5, retlen 0 Sep 17 10:26:30 ##Hostname## kernel: Not marking the space at 0x04cb6db0 as dirty because the flash driver returned retlen zero Sep 17 10:26:30 ##Hostname## kernel: verify buffer:e6a1 e681 Sep 17 10:26:30 ##Hostname## kernel: jffs2_flush_wbuf(): Write failed with -5 Sep 17 10:26:30 ##Hostname## kernel: verify buffer:7288 6288 Sep 17 10:26:30 ##Hostname## kernel: Recovery of wbuf failed due to a second write error ====>>second write error Sep 17 10:26:30 ##Hostname## kernel: Write of 1381 bytes at 0x03b20000 failed. returned -5, retlen 0 Sep 17 10:26:30 ##Hostname## kernel: Not marking the space at 0x03b20000 as dirty because the flash driver returned retlen zero Sep 17 10:26:30 ##Hostname## kernel: verify buffer:e6a1 e681 Sep 17 10:26:30 ##Hostname## kernel: jffs2_flush_wbuf(): Write failed with -5 Sep 17 10:26:30 ##Hostname## kernel: verify buffer:7288 6288 Sep 17 10:26:30 ##Hostname## kernel: Recovery of wbuf failed due to a second write error ====>>second write error Sep 17 10:26:30 ##Hostname## kernel: Write of 1381 bytes at 0x03b20000 failed. returned -5, retlen 0 Sep 17 10:26:30 ##Hostname## kernel: Not marking the space at 0x03b20000 as dirty because the flash driver returned retlen zero Sep 17 10:26:30 ##Hostname## kernel: verify buffer:e6a1 e681 Sep 17 10:26:30 ##Hostname## kernel: jffs2_flush_wbuf(): Write failed with -5 Sep 17 10:26:30 ##Hostname## kernel: verify buffer:7288 6288 JFFS2:1506021043.098011:jffs2_gcd_mtd13(191):[16 /hcpc/scflog2]:start gc thread. JFFS2 error statistics: This may lead to the system failing to recover from the failing write operation; this is again visible in the Linux messages file (spos_logs/@var@log@messages*) and dmesg file (spos_logs/@scf@bin@*dmesg*) -bash-3.2$ grep second @var@log@messages
Sep 17 10:28:03 ##Hostname## kernel: Recovery of wbuf failed due to a second write error SolutionRecovery of wbuf failed due to a second write error" can experience WDT situation . Replacing the XSCFU hardware will resolve the issue Bug 15632749 : SUNBT6938935 Watchdog timeout situations due to recovery of wbuf failure References<NOTE:1942533.1> - M-Series Servers: XSCF watchdog timeout without auto negotiation on Ethernet port<NOTE:1021929.1> - SCF-8005-NE - XSCF firmware is defective. <NOTE:1339399.1> - Automated Diagnosis Requirements and Expectations for SPARC Servers <NOTE:2097446.1> - SRDC – SPARC Mx000 and M10/M12 systems: Simple Instructions to Collect an XCP Snapshot Attachments This solution has no attachment |
||||||||||||||||||
|