![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1672184.1 : SunFire[TM] 12K/15K/E20K/E25K:System Controller failover because SCPER1 is being deconfigured
In this Document
Applies to:Sun Fire E25K Server - Version All Versions to All Versions [Release All Releases]Sun Fire 12K Server - Version All Versions to All Versions [Release All Releases] Sun Fire E20K Server - Version All Versions to All Versions [Release All Releases] Sun Fire 15K Server - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. SymptomsThe following messages will appear when the power is failing on the SCs peripheral board
Apr 12 13:06:02 2014 localhost-sc1-hme0 esmd[25768]: [1920 9749903521434542 ERR DetectorV.cc 613] A high voltage has been detected on 3.3VHK, located on SCPER1. The voltage detected is 4.53v; should be 3.00v to 3.50v. SCPER1 is being deconfigured and powered off. Check all hardware for the cause.
Apr 12 13:12:56 2014 localhost-sc1-hme0 ssd[2106]: [1319 119715124649 NOTICE SSDWorkArea.cc 38] ssd output: SMS 1.6 start-up initiated Apr 12 13:12:56 2014 localhost-sc1-hme0 ssd[2106]: [1319 119759223543 NOTICE SSDWorkArea.cc 38] ssd output: SC POST results: 'CP1500 POST Passed; SSCPOST v1.25 Passed' Apr 12 13:12:56 2014 localhost-sc1-hme0 ssd[2106]: [1304 119852261626 NOTICE StartupManager.cc 2744] software component start-up initiated: name=hwad Apr 12 13:12:58 2014 localhost-sc1-hme0 ssd[2106]: [1304 121402054917 NOTICE StartupManager.cc 2744] software component start-up initiated: name=mand Apr 12 13:12:59 2014 localhost-sc1-hme0 ssd[2106]: [1304 121921660404 NOTICE StartupManager.cc 2744] software component start-up initiated: name=frad Apr 12 13:12:59 2014 localhost-sc1-hme0 ssd[2106]: [1304 122441777767 NOTICE StartupManager.cc 2744] software component start-up initiated: name=fomd [There may some other messages show up which are just SC reporting messages Apr 12 13:13:11 2014 localhost-sc1-hme0 fomd[2137]: [8600 134481139721 NOTICE FailoverMgr.cc 2842] Heartbeat interrupt detected
Apr 12 13:13:12 2014 localhost-sc1-hme0 ssd[2106]: [1320 135459079581 NOTICE StartupManager.cc 423] SMS software startup complete. Apr 12 13:13:12 2014 localhost-sc1-hme0 fomd[2137]: [8563 135561498788 NOTICE FOConfig.cc 204] Failed to configure the logical interface - the interface may have already been removed, please check (ecode = -1) Apr 12 13:13:12 2014 localhost-sc1-hme0 fomd[2137]: [8577 135562418705 NOTICE FailoverMgr.cc 3226] SC configured as Spare Apr 12 13:13:12 2014 localhost-sc1-hme0 fomd[2137]: [8624 135737259010 NOTICE FMI2NetTest.cc 148] Remote SC is running SMS 1.6 Apr 12 13:06:44 2014 localhost-sc0-hme0 fomd[2240]: [8609 9228638330146745 ERR RemoteSC.cc 964] Remote SC call failed: RPC: Timed out
Apr 12 13:06:44 2014 localhost-sc0-hme0 fomd[2240]: [8569 9228638332467521 NOTICE FailoverMgr.cc 1377] The I2 network test FAILED Apr 12 13:07:44 2014 localhost-sc0-hme0 fomd[2240]: [8612 9228698351571928 ERR FOHASram.cc 1824] Timeout waiting for response from remote SC Apr 12 13:07:44 2014 localhost-sc0-hme0 fomd[2240]: [8569 9228698352973495 NOTICE FailoverMgr.cc 1377] The HASRAM network test FAILED Apr 12 13:07:44 2014 localhost-sc0-hme0 fomd[2240]: [8599 9228698353719138 NOTICE FMHeartbeat.cc 223] Checking for SC heartbeat interrupts (can take up to 50 seconds) ... Apr 12 13:08:09 2014 localhost-sc0-hme0 fomd[2240]: [8582 9228722870838730 NOTICE FailoverMgr.cc 5256] Not detecting remote SC's heartbeat interrupts Apr 12 13:08:09 2014 localhost-sc0-hme0 fomd[2240]: [8574 9228722872157503 NOTICE FailoverMgr.cc 2297] Taking over main role because remote SC is unresponsive or down Apr 12 13:08:09 2014 localhost-sc0-hme0 fomd[2240]: [8519 9228722873273673 NOTICE FailoverMgr.cc 2631] Failover deactivated Apr 12 13:08:14 2014 localhost-sc0-hme0 fomd[2240]: [8570 9228728150848495 NOTICE FailoverMgr.cc 2356] Reset the remote SC Apr 12 13:08:46 2014 localhost-sc0-hme0 hwad[2171]: [50144 9228760130438198 NOTICE DevPresent.cc 1172] Changed clock sources. ChangesNo change have been make to the platform CauseThe voltage event occurs on the respective system controller peripheral board SolutionThe solution is to replace the System Controller Peripheral Board on the failing System Controller. Do not replace the SC itself Understand the difference to DOC 1583980.1 where the power failure was on the SC References<NOTE:1583980.1> - SunFire[TM] 12K/15K/E20K/E25K:System Controller Is Down<NOTE:1001320.1> - SMS DC Power Supply Voltage Monitoring Flaw May Expose a Domain to Outage Attachments This solution has no attachment |
||||||||||||||||||||
|