SMF-8000-YX - A Service In Maintenance State

Asset ID:	1-79-1173666.1
Update Date:	2017-09-19
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1173666.1 : SMF-8000-YX - A Service In Maintenance State

Applies to:

Solaris Operating System - Version 10 3/05 and later
Oracle ZFS Storage ZS3-4
SPARC T8-4
SPARC T8-2
SPARC T8-1
Information in this document applies to any platform.

Purpose

Provide additional information for Message ID: SMF-8000-YX

Details

Type

Defect

defect.sunos.smf.svc.maintenance

Severity

Major

Description

A service failed and could not be restarted.

Automated Response

The service has been placed into the maintenance state.

Impact

The service is unavailable.

Suggested Action for System Administrator

Run svcs -x to determine why the service failed and the location of logfiles (/var/svc/log), if any.

Details

Summary

When the service management facility (see smf(5)) determines that a service instance should be placed into the maintenance state, the fault management subsystem tracks this maintenance state via a new problem diagnosis with diagnosis message id SMF-8000-YX

The message id SMF-8000-YX is a generic identifier for "a service entering maintenance state", whatever the affected service or the reason for it entering that state. After investigating and addressing the cause of the maintenance state (see below) a suitably-privileged administrator may clear that state using either SMF commands (svcadm) or fault management commands (fmadm).

Why Do Service Instances Enter Maintenance State?

Common failure modes that result in an instance being placed in maintenance state are:

a service method failed, for example if the service start method exits with $SMF_EXIT_ERR_CONFIG having detected some invalid configuration
the service instance is starting successfully but then failing and restarting again - repeatedly
some other service determined that this instance needs to go into maintenance state, and made the request
an administrator requested maintenance state using svcadm

The failure modes will be tabulated below, but note that these are simply the generically observable failure symptoms and do not describe the particular reason why a given service is exhibiting those symptoms. For example, if a service start method for

svc:/foo/bar:default

exits with

$SMF_EXIT_ERR_CONFIG

because a required configuration file is corrupt the failure mode is "(start) method failed" and the administrator will have to look into log files and the like for the affected service to determine exactly why

svc:/foo/bar:default

has a configuration problem.

What To Do?

In the following we write <fmri> for the FMRI of the affected service instance - for example svc:/network/ntp:default for the ntp service.

Use svcs -xv to determine which instance (or instances) are in maintenance state (or use the svcs -xv <fmri>) command from the console output as in the example below). For each such instance, svcs -xv will list a reason for the maintenance state (a failure symptom, as explained above) and link to an article that elaborates on that failure symptom. The output will also list the location of the service instance log file which should be the starting point for additional investigation (all instance method output is accumulated in the service instance log file; of course the application may also log application-specific detail to other unrelated logfiles).
Investigate and address the root cause of the failure using the initial clues from 1) and your knowledge of the affected service.
Perform one of the following:
- To clear maintenance state and have the instance attempt to move to either online or disabled state (according to current repository state for this instance, as shown by svcprop <fmri> general/enabled) use svcadm clear <fmri>; you can also use fmadm repaired <problem-uuid>. If using svcadm you may abbreviate the fmri as per usual SMF practise.
- To disable the instance, possibly without having addressed the cause of the maintenance state, use svcadm disable <fmri>.
Use svcs <fmri> to verify that the service instance does not return to maintenance state. The instance will attempt to enter the enabled or disabled state, as designated in the repository (i.e., if it was previously enabled then on clear it will try to move to online state; if it was disabled it will simply move to that state). Wait a short time before trying to verify online state - the start method needs time to run.

Example

Suppose that svc:/network/ntp:default is enabled but that the configuration file /etc/inet/ntp.conf is absent. On the console we see a new problem diagnosis as follows (and other notification mechanisms that are configured such as snmp and email will show similar information):

SUNW-MSG-ID: SMF-8000-YX, TYPE: defect, VER: 1, SEVERITY: major
EVENT-TIME: Mon May 17 22:38:34 PDT 2010
PLATFORM: Sun-Fire-V40z, CSN: XG051535088, HOSTNAME: parity
SOURCE: software-diagnosis, REV: 0.1
EVENT-ID: 97911e1b-f7a3-cc69-f850-c969e0a7c222
DESC: A service failed - a start, stop or refresh method failed

Refer to http://sun.com/msg/SMF-8000-YX for more information.

AUTO-RESPONSE: The service has been placed into the maintenance state.

IMPACT:

svc:/network/ntp:default is unavailable

REC-ACTION:
Run svcs -xv svc:/network/ntp:default to determine why the service failed and the location of logfiles, if any

While SMF-8000-YX is a generic "maintenance state" code, the message above does have some dynamic aspects that are specific to this particular case - these are highlighted in red above. Running the suggested command we see:

# svcs -xv svc:/network/ntp:default
svc:/network/ntp:default (Network Time Protocol (NTP) Version 4) 
State: maintenance since Mon May 17 22:38:34 2010 
Reason:

Start method exited with $SMF_EXIT_ERR_CONFIG.
See: http://sun.com/msg/SMF-8000-KS
See: man -M /usr/share/man -s 1M ntpd
See: man -M /usr/share/man -s 4 ntp.conf
See: man -M /usr/share/man -s 1M ntp
See: /var/svc/log/network-ntp:default.log

Impact: This service is not running.

Note that svcs -xv output has been more specific than the console messaging in that it has indicated which method failed and what it returned; it also links to an article that elaborates on the "start method failed" failure mode, and provides a pointer to the service instance log. Inspecting the tail of that log and correlating with the timestamp of 22:38 above we see:

[ May 17 22:38:34 Enabled. ]
[ May 17 22:38:34 Executing start method ("/lib/svc/method/ntp start"). ]
Error: Configuration file '/etc/inet/ntp.conf' not found. See ntpd(1M).
[ May 17 22:38:34 Method "start" exited with status 96. ]
 

The error message is the result of the instance start method writing to standard output or standard error. In this case the cause is obvious; in more complex cases the above is simply the beginning of an investigation to debug the root cause of the maintenance state, typically involving some service-specific expertise. Suppose we now create a valid

/etc/inet/ntp.conf

and wish to clear maintenance state; the instance will attempt to move to online state since it is enabled in the repository:

# svcadm clear ntp
# svcs ntp
STATE STIME FMRI
online 22:55:41 svc:/network/ntp:default
 

Note that the abbreviation ntp was used instead of the full fmri string svc:/network/ntp:default (which would also have worked but would be longer to type).

One could also have used fmadm repaired 97911e1b-f7a3-cc69-f850-c969e0a7c222.

NOTE: In Solaris if DNS, LDAP, NIS, AD are not configured to resolve network naming service please set the DNS server to loop back IP address 127.0.0.1 so svc:/system/auditd:default and svc:/system/auditset:default do not go into maintenance state.

Maintenance Reasons

In the table below we list the possible reasons for entering maintenance state, and link to the corresponding article that provides more information for this particular reason.

Maintenance Reasons and Corresponding Articles
svcs -xv code	Description
SMF-8000-63	Command line svcadm mark maintenance <fmri> was performed. Note that such a request will not result in a problem diagnosis in fault management software.
SMF-8000-N3	An SMF repository inconsistency exists - the desired state of the instance at the time it is read into the graph engine is invalid.
SMF-8000-HP	A cycle exists in the stated dependencies of this instance.
SMF-8000-7Y	An instance method is failing in a retryable manner, but the number of retries performed without success has exceeded the fault threshold.
SMF-8000-JA	The service has an invalid dependency.
SMF-8000-2A	The restarter specified in the repository for this service instance is invalid.
SMF-8000-8Q, SMF-8000-KS	Instance method failure. For start method failure svcs -x indicates SMF-8000-KS; for others it uses SMF-8000-8Q.
SMF-8000-L5	The instance is failing and successfully restarting with too high a frequency.
SMF-8000-R4	Another service requested that this instance enter maintenance state, such as by using svcadm mark maintenance in its start method.

See "Predictive Self-Healing" for additional information. Specifically, view "SMF How To Guide" the section entitled "Retrieving Dependency Tree Information". This provides detailed information for troubleshooting service dependency issues. For step by step troubleshooting, reference the Systems Administration Guide: Basic Administration: Troubleshooting the Service Management Facility documentation.

Attachments

This solution has no attachment