![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1020467.1 : Sun Fire 3800/4800/4810/6800/E2900/E4900/E6900/V1280 and Netra[TM] 1280/1290 Server: how To manage "Unable to send ECC event message to System Controller" messages
PreviouslyPublishedAs 259008 Applies to:Sun Fire E6900 Server - Version Not Applicable and laterSun Netra 1280 Server - Version Not Applicable and later Sun Fire V1280 Server - Version Not Applicable and later Sun Fire 3800 Server - Version Not Applicable and later Sun Fire 4800 Server - Version Not Applicable and later All Platforms Please look at the Product(s) section for a full list of products. GoalDescriptionHow to deal with "Unable to send ECC event message to System Controller" messages May 10 03:10:28 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC
May 10 03:11:28 system last message repeated 2 times May 10 03:11:53 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:11:58 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:13:58 system last message repeated 5 times May 10 03:14:12 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:14:28 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:17:28 system last message repeated 6 times May 10 03:17:29 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:17:58 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:18:28 system last message repeated 1 time May 10 03:18:58 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:18:58 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:24:58 system last message repeated 13 times May 10 03:25:25 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:25:28 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:28:58 system last message repeated 8 times May 10 03:29:27 system sgsbbc: [ID 428960 kern.notice] NOTICE: Unable to send ECC event message to System Controller May 10 03:29:28 system sgsbbc: [ID 538587 kern.notice] NOTICE: Timed out sending message to SC May 10 03:31:58 system last message repeated 5 times SolutionCauseThese messages are caused by a flood of errors, for example a dimm causing many hundreds or thousands of CE's (Correctable Errors). The flood of errors on the domain is more then the domain to SC data path can handle. Note: The CE flood or storm is sometimes caused by FMA not retiring pages correctly. It is important to install the latest FMA patches, for example:
Patch 139572-02 SunOS[TM] 5.10: fmd patch (or later) fixes Sun CR 6714311 Updated P2 fma/mem fmstat seems to hang after/during CE storm This bug causes page retirement to malfunction. Also: Patch 120011-14 SunOS 5.10: kernel patch (or later) and Patch 125369-12 SunOS 5.10: Fault Manager patch (or later) are quite important to have installed in order to avoid known issues that can lead to this condition.
BackgroundData Transactions go into the error buffer, and the error buffer on the SC is getting full. By design, it only holds about 100 messages. Because Solaris can no longer write to the error buffer, we get the notices in /var/adm/messages which indicate "Unable to send ECC event message to System Controller". Date: Thu May 07 15:50:45 EDT 2009 Device: /partition0/domain0/SB2/dx2 ErrorID: 0x32091ff0 Port: 0 Syndrome: 0x2f(CE bit 10) Direction: outgoing read First error: true TargetAid: 0x8 Transid: 0x2 . . Date: Thu May 07 15:50:45 EDT 2009 Device: /partition0/domain0/SB2/dx3 ErrorID: 0x33091ff0 Port: 0 Syndrome: 0x1c(CE bit 11) Direction: outgoing read First error: true TargetAid: 0x8 Transid: 0x2 . . Date: Thu May 07 15:50:46 EDT 2009 Device: /partition0/domain0/SB0/dx2 ErrorID: 0x32091ff0 Port: 0 Syndrome: 0x2f(CE bit 10) Direction: incoming read First error: true TargetAid: 0x4 Transid: 0x1 . . NOTE: The service mode command clearerrorbuffer can be used to clear the error buffer and prevent the "Unable to send" event messages from showing up again in /var/adm/messages (unless the error storm persists).
However, service mode requires that you contact Oracle Support Services to obtain a password and this special mode is only to be executed by Oracle badged employees. This is one reasons that using clearerrorbuffer is not really a viable solution to this problem. The main reason this isn't a viable solution is that this method to "resolve" the issue will wipe clean all the errors in the error buffer and could prevent you from being able to ID the dimm responsible for the noise in the first place. It is best to install the correct patches and/or replace the dimm in the first place.
References<NOTE:1002710.1> - Sun Fire[TM] v1280, 3800, 4800, 4810, 6800, E2900, E4900, E6900, and Netra[TM] 1280, and 1290 systems: Incoming versus Outgoing errors.<NOTE:1010655.1> - Where can I get the service mode password for a Sun Fire[TM] 3800/48x0/6800/E4900/E6900/E2900/V1280 and Netra[TM] 1280/1290 server? Attachments This solution has no attachment |
||||||||||||
|