SuperCluster: Domain with iscsi based rpool with Solaris Cluster could hang during boot/reboot

Asset ID:	1-72-2270965.1
Update Date:	2017-05-27
Keywords:

Solution Type Problem Resolution Sure

Solution 2270965.1 : SuperCluster: Domain with iscsi based rpool with Solaris Cluster could hang during boot/reboot

Applies to:

Oracle SuperCluster Specific Software - Version 2.x to 2.x [Release 2.0]
Oracle SuperCluster M7 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Solaris 11 may hang during boot when the root pool is located on a iSCSI devices and Solaris Cluster is in use. To be susceptible to the problem, the root pool must be located on iSCSI devices and Solaris Cluster fencing enabled on the devices used by the root pool.

Hang may look like

{0} ok boot
Boot device: /pci@308/pci@1/usb@0/hub@1/storage@3/disk@0,0:a File and args:
SunOS Release 5.11 Version 11,3 64-bit
Copyright (c) 1983, 2016, Oracle and/or its affiliates. All rights reserved.
NOTICE: Configuring iSCSI to access the root filesystem...

Changes

Can occur on any ldom reboot scheduled or unscheduled

Cause

Bug 25703862 - SuperCluster M7 booting from iSCSI device on ZFS-ES hangs while booting

Solution

A identify if the root pool is located on a iSCSI device

$ zpool status -v rpool
pool: rpool
state: ONLINE
scan: none requested
config:

NAME STATE READ
WRITE CKSUM
rpool ONLINE 0 0 0
c0t600144F0B971DAD6000058A57BC80099d0 ONLINE 0 0 0

errors: No known data errors

$ iscsiadm list target -S | grep
c0t600144F0B971DAD6000058A57BC80099d0
OS Device Name:
/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0s2
OS Device Name:
/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0s2

Confirm the system has Solaris Cluster installed and fencing is enabled.

$ pkg list ha-cluster/system/core
NAME (PUBLISHER) VERSION
IFO
ha-cluster/system/core (ha-cluster) 4.3-7.1.0 i--

$ cldev list -v c0t600144F0B971DAD6000058A57BC80099d0
DID Device Full Device Path
---------- ----------------
d2 ssccn1:/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0

# cldev show d2

=== DID Device Instances ===

DID Device Name: /dev/did/rdsk/d2
Full Device Path:
ssccn1:/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0
Replication: none
default_fencing: global

If the system is susceptible, fencing must be disabled on the devices used by the root pool.

$ zpool status -v rpool
pool: rpool
state: ONLINE
scan: none requested
config:

NAME STATE READ
WRITE CKSUM
rpool ONLINE 0 0 0
c0t600144F0B971DAD6000058A57BC80099d0 ONLINE 0 0 0

errors: No known data errors

$ cldev list -v c0t600144F0B971DAD6000058A57BC80099d0
DID Device Full Device Path
---------- ----------------
d2 ssccn1:/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0

# cldev set -p default_fencing=nofencing d2
Updating shared devices on node 1

confirm fencing has been disabled by checking the default_fencing field is
set to nofencing.

# cldev show d2

=== DID Device Instances ===

DID Device Name: /dev/did/rdsk/d2
Full Device Path:
ssccn1:/dev/rdsk/c0t600144F0B971DAD6000058A57BC80099d0
Replication: none
default_fencing: nofencing

Attachments

This solution has no attachment