Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2192795.1
Update Date:2017-09-29
Keywords:

Solution Type  Problem Resolution Sure

Solution  2192795.1 :   SPARC(R) Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers: upon XSCF reboot, sckmd pppd and scfd daemons may generate messages on Solaris domain that can be interpreted as faults  


Related Items
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-12826932031>

Applies to:

Sun SPARC Enterprise M4000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M8000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M3000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M9000-32 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 Solaris domains running on SPARC(R) Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers may show logs similar to the following ones in /var/adm/messages:

May 24 12:08:16 domaintest pppd[673]: [ID 702911 daemon.notice] Modem hangup
May 24 12:08:16 domaintest ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
May 24 12:08:16 domaintest pppd[673]: [ID 702911 daemon.notice] Connection terminated.
May 24 12:08:34 domaintest scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
May 24 12:08:34 domaintest sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error

that may be interpreted as an issue by system administrators.

Cause

 Solaris instance running on a Mx000 domain and XSCF are connected by means of an internal point-to-point network, called DSCP (check Document 1009921.1 for further references):

root@m4000-test-dom0:~# ifconfig -a
...snip...
sppp0: flags=1010010008d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU,PHYSRUNNING> mtu 1500 index 3
inet 192.168.224.2 --> 192.168.224.1 netmask ffffff00
ether 0:0:0:0:0:0

XSCF> showdscp

DSCP Configuration:

Network: 192.168.224.0
Netmask: 255.255.255.0

Location Address
---------- ---------
XSCF 192.168.224.1
Domain #00 192.168.224.2
Domain #01 192.168.224.3

The Solaris error logs in messages (when present in a short sequence like the one shown above) are usually the result of XSCF reboot.

Solaris daemons managing the internal connection between Solaris OE and XSCF do complain as the network peer (XSCF) is temporarily unavailable.

Solution

 The above can be easily replicated by rebooting XSCF while Solaris domain is up&running:

XSCF> showdate
Thu Oct 13 09:10:00 BST 2016

XSCF> rebootxscf
The XSCF will be reset. Continue? [y|n] :y

Just after the XSCF reboot, Solaris messages show:

Oct 13 09:10:23 m4000-test-dom0 scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
Oct 13 09:10:23 m4000-test-dom0 pppd[588]: [ID 702911 daemon.notice] Modem hangup
Oct 13 09:10:23 m4000-test-dom0 sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error
Oct 13 09:10:23 m4000-test-dom0 pppd[588]: [ID 702911 daemon.notice] Connection terminated.
Oct 13 09:15:22 m4000-test-dom0 scfd: [ID 492411 kern.notice] NOTICE: scfd: SCF online.

and later, the connection between XSCF and domain is automatically reestablished when XSCF reboot completes:

Oct 13 09:15:32 m4000-tvp540-d-dom0 pppd[588]: [ID 702911 daemon.notice] Connect: sppp0 <--> /dev/dm2s0
Oct 13 09:15:33 m4000-tvp540-d-dom0 pppd[588]: [ID 702911 daemon.notice] local IP address 192.168.224.2
Oct 13 09:15:33 m4000-tvp540-d-dom0 pppd[588]: [ID 702911 daemon.notice] remote IP address 192.168.224.1

 

Please note that:

  • the above logs may also be caused by some real issues of one (or more) of the above daemons; please ensure to verify the relationship with an XSCF reboot by collecting relevant XSCF data (i.e.: XSCF snapshot, Document 2097446.1)
  • some of the above logs may be missing if the corresponding Solaris daemon is not running:
root@m4000-test-dom0:~# svcs -a | egrep 'dcs|sck|dsc|ppp'
legacy_run Oct_12 lrc:/etc/rc2_d/S47pppd
online Oct_12 svc:/platform/sun4u/sckmd:default
online Oct_12 svc:/platform/sun4u/dscp:default
online Oct_12 svc:/platform/sun4u/dcs:default

In case the error logs are clearly the effect of an XSCF reboot, they can be safely ignored.

  

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, SPARC M-series Servers.

 

Internal Section

Scenarios like the one outlined below

Jun 4 05:47:33 nwr2db3no pppd[673]: [ID 702911 daemon.notice] Modem hangup
Jun 4 05:47:33 nwr2db3no ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
Jun 4 05:47:33 nwr2db3no pppd[673]: [ID 702911 daemon.notice] Connection terminated.
Jun 4 05:47:52 nwr2db3no scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
Jun 4 05:47:52 nwr2db3no sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error
Jun 4 05:59:51 nwr2db3no genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_142900-03 64-bit               ---> domain unplanned reboot
Jun 4 05:59:51 nwr2db3no genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Jun 5 04:05:33 nwr2db3no pppd[681]: [ID 702911 daemon.notice] Modem hangup

Jun 5 04:05:33 nwr2db3no ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
Jun 5 04:05:33 nwr2db3no pppd[681]: [ID 702911 daemon.notice] Connection terminated.
Jun 5 04:05:50 nwr2db3no scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
Jun 5 04:05:50 nwr2db3no sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error
Jun 5 04:17:45 nwr2db3no genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_142900-03 64-bit
Jun 5 04:17:45 nwr2db3no genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
Jun 5 04:17:45 nwr2db3no Use is subject to license terms.

May 24 12:08:16 nwr2db3no pppd[673]: [ID 702911 daemon.notice] Modem hangup
May 24 12:08:16 nwr2db3no ip: [ID 646971 kern.notice] ip_create_dl: hw addr length = 0
May 24 12:08:16 nwr2db3no pppd[673]: [ID 702911 daemon.notice] Connection terminated.
May 24 12:08:34 nwr2db3no scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
May 24 12:08:34 nwr2db3no sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error
May 24 12:23:45 nwr2db3no genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_142900-03 64-bit
May 24 12:23:45 nwr2db3no genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
May 24 12:23:45 nwr2db3no Use is subject to license terms.

May 29 08:26:16 nwr2db3no pppd[455]: [ID 702911 daemon.notice] Connection terminated.
May 29 08:26:34 nwr2db3no scfd: [ID 556826 kern.warning] WARNING: scfd: SCF went to offline mode. unit=0
May 29 08:26:34 nwr2db3no sckmd: [ID 889677 daemon.error] failed to receive sckm message: I/O error
May 29 08:38:19 nwr2db3no genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_142900-03 64-bit
May 29 08:38:19 nwr2db3no genunix: [ID 943908 kern.notice] Copyright 1983-2009 Sun Microsystems, Inc. All rights reserved.
May 29 08:38:19 nwr2db3no Use is subject to license terms.

may indicate that domain unplanned resets are occurring every time the XSCF reboots.

This will need to be carefully investigated by collecting the relevant XSCF snapshot in order to clearly match the domain and XSCF reboots and confirm the above hypothesis. One of the possible root cause is described into the following Document

All M-Series Systems XSCFUs May Fail and/or Halt Due to Berkeley DataBase Corruption Without System Firmware Upgrade Version 1112 on SPARC Enterprise M8000/M9000-32/M9000-64 Servers, or a Minimum Version of 1113 on M3000/M4000/M5000 Servers (Doc ID 1458754.1)

in case platform FW is below 1092 release and BDB corruption is detected upon XSCF reboot (active domains are reset).


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback