Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1994858.1
Update Date:2017-09-29
Keywords:

Solution Type  Problem Resolution Sure

Solution  1994858.1 :   Sun SPARC(R) Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers: XSCF's hostname set to "localhost" may cause XSCF unplanned reboots and domain(s) outage  


Related Items
  • Sun SPARC Enterprise M8000 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-10468113941>

Applies to:

Sun SPARC Enterprise M3000 Server - Version All Versions and later
Sun SPARC Enterprise M8000 Server - Version All Versions and later
Sun SPARC Enterprise M9000-32 Server - Version All Versions and later
Sun SPARC Enterprise M4000 Server - Version All Versions and later
Sun SPARC Enterprise M5000 Server - Version All Versions and later
Information in this document applies to any platform.

Symptoms

On a Mx000 platform, if the XSCF's hostname has been set to "localhost" and the IP address assigned to XSCF is not 127.0.0.1, domain power-on will not be possible as XSCF will self-reboot, as shown in the example below, taken while connected at XSCF's serial port:

XSCF> poweron -d 0
DomainIDs to power on:00
Continue? [y|n] :y

Mar 24 17:08:05 localhost XSCF[106]: process down [/scf/sbin/sequence] (pid=776, term=6) process down detect by init
Mar 24 17:08:05 localhost XSCF[106]: Reboot sequence start
Mar 24 17:08:05 localhost XSCF[106]: XSCF shutdown sequence start
execute K000end -- complete
Aborted
execute K100end -- complete
execute K101end -- complete
unmount /hcp0/linux
unmount /hcp0/scfprog
unmount /hcp0/gendata -- complete
unmount /hcp0/remcscm -- complete
unmount /hcp1/linux
unmount /hcp1/scfprog
unmount /hcp1/gendata
unmount /hcp1/remcscm
unmount /hcpcommon/setup -- complete
unmount /hcpcommon/obp -- complete
unmount /hcpcommon/tmp -- complete
unmount /hcpcommon/var
unmount /hcpcommon/scflog1 -- complete
unmount /hcpcommon/scflog2 -- complete
...snip...                                            ---> XSCF reboot and poweron aborted

Upon reboot, XSCF will be available again and the events similar to the following will be logged:

Mar 24 16:21:40 localhost Information: /FIRMWARE,/XSCFU:SCF:XSCF process down detected
Mar 24 16:48:46 localhost Information: /FIRMWARE,/XSCFU:SCF:XSCF process down detected
Mar 24 16:53:41 localhost Information: /FIRMWARE,/XSCFU:SCF:XSCF process down detected

but again a domain poweron will fail.

Also, while in this XSCF's configuration, if a domain is at the OK prompt and platform administrator performs a reset-all (or boot after a xir), then the XSCF will go in a reboot loop and will stop after 3 reboots with the following error:
XSCF FAULT (reason=0)

 

Cause

Issue is caused by a known XSCF bug that is triggered by the incorrect hostname setting: being the hostname "localhost", XSCF expects IP to be 127.0.0.1; if it's not the case, XSCF's OS process /scf/sbin/sequence will die and the XSCF will reboot.

You may have confirmation that system is hitting this issue as process down message on XSCF will involve /scf/sbin/sequence (see the example above).

 

Solution

XSCF replacement and/or platform Firmware upgrade will not fix the issue. Please note that XSCF will refuse to accept "localhost" as hostname starting with Firmware (aka XCP release) 1080:

XSCF> sethostname xscf#0 localhost
"localhost" are not allowed.

but hostmane will be maintained in case "localhost" was setup in the past (with older fw releases) and Firmware is upgraded.

Steps to fix

You'll need to modify the XSCF's hostname and reboot the XSCF; please check the example below:

XSCF> sethostname xscf#0 otherthenlocalhost
XSCF> applynetwork
The following network settings will be applied:
xscf#0 hostname :otherthenlocalhost
DNS domain name :test.oracle.com
...snip...

Continue? [y|n] :y
Please reset the XSCF by rebootxscf to apply the network settings.
Please confirm that the settings have been applied by executing
showhostname, shownetwork, showroute and shownameserver after rebooting
the XSCF.
XSCF> rebootxscf
The XSCF will be reset. Continue? [y|n] :y

Upon reboot, XSCF's hostname will be modified and domain poweron will work.

Best practice 

At installation/configuration time, please avoid to use "localhost" as XSCF's hostname, and ensure to change it according to your organization's standards.

 

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in an appropriate
My Oracle Support Community - Oracle Sun Technologies Community.

 

Internal only

Procedure to fix the XSCF reboot loop (taken from Bug 15452602):

1. insert a XSCF with factory mode enabled (THIS is why it's not possible to be fixed in the field
2. cd /scf/bin; mv sethostname_boot sethostname_boot.orig
... this will prevent xscf to work correctly
... as a side effect, XSCF won't try to reboot/reset the domain at the next XSCF reboot
3. when you have the message "XSCF Initialize complete.", rename the script to it's original name (mv sethostname_boot.orig sethostname_boot)
4. reboot
5. after the reboot, wait for "XSCF Initialize complete."
6. Login as root and fix the hostname with sethostname + applynetwork
7. disable factory mode
8. rebootxscf

References

  1. SPARC Enterprise M3000/M4000/M5000/M8000/M9000 - XCP Firmware to Defect Cross Reference (Doc ID 1380260.1)
  2. Sun SPARC[R] Enterprise M3000/M4000/M5000/M8000/M9000 (OPL): Information & Troubleshooting certain XSCFU faults (Doc ID 1012822.1)
Please note that Doc ID 1012822.1 reference CR# 6653783 when XSCF self-reboot generates FMA event SCF-8006-YS; the "localhost" issue can also generate the FMA event SCF-8005-NE or similar ones.
  
Please note that similar "XSCF FAULT (reason=0)" error code on XSCF may also appear in scenarios where XSCF itself is faulty: please check the following Document:
Sun SPARC Enterprise Mx000 Server: How to gather information when the XSCF is inaccessible via the network (Doc ID 1395544.1)
Generally, in the above case XSCF will fail to boot completely, XSCF prompt will never be available, and issue is not directly triggered by domain poweron.

References

<BUG:15452602> - SUNBT6653783 APPLYNETWORK SHOULD RETURN AN ERROR IF HOSTNAME=LOCALHOST AND IP AD

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback