FAB: Non-Hardware: Proactive: ZFS Storage Appliance Clustering Setup fails for certain ZS3-2 Systems due to incorrect programming of SEEPROMs.

Asset ID:	1-73-1610362.1
Update Date:	2017-03-08
Keywords:

Solution Type FAB (standard) Sure

Solution 1610362.1 : FAB: Non-Hardware: Proactive: ZFS Storage Appliance Clustering Setup fails for certain ZS3-2 Systems due to incorrect programming of SEEPROMs.

Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
__________

Affected Parts:

7016786 - Motherboard for the ZFS3-2 controller (head node)
7019696 - Motherboard FRU for the ZS3-2 controller (head node).

Symptoms

In ak shell, the "cluster" option under "configuration" doesn't exist on affected appliances.

All affected ZS3-2 appliances will display the following message under the Configuration -> Cluster tab on the AK BUI:

No cluster links found. This system does not support clustering, or it must be upgraded before clustering can be configured.

Attempts to configure the cluster by clicking on the SETUP tab will display the following message:

This appliance does not contain the required Sun Clustron cluster interconnect driver, or is of a model that does not support clustering. Contact your service provider for information on downgrading to cluster operation.

Impact

Some ZFS Storage ZS3-2 systems have incorrectly programmed SEEPROMs. This stops these ZFS Appliances from being able to be clustered together to use two appliance heads (controllers) with shared storage

ZS3-2 systems not being used in a clustered environment, are not affected by this issue. It is still recommended to upgrade these systems to correct this anomaly.

Changes

Contributing Factors

This error will only occur on clustered systems.

Cause

EEPROM connected to device is programmed incorrectly (Not programmed).

Initially Manufacturing did not check the details of this data. A new test process has been implemented to rectify this test gap.

Solution

Workaround

A workaround to reprogram affected SEEPROMs is available via the normal customer support channels.

Resolution

This proactive remediation activity and the actual SEEPROM reprogramming is to be done Oracle TSC staff only and does not involve the field.

A workflow have been developed that can determine if a system is affected or not. If it is not affected, the workflow does not attempt any reprogramming.

This workflow is available via the below URL (but is only accessable to TSC personnel).

https://stbeehive.oracle.com/content/dav/st/AmberRoadSupport/Documents/Workflow/clustron_eeprom.akwf

There is a new version of the clustron_eeprom workflow available now.

https://stbeehive.oracle.com/content/dav/st/AmberRoadSupport/Documents/Workflow/clustron_eeprom-v2.1.akwf

The initial version only worked with 2013.1.0.x and 2013.1.1.x, the current version will work on any version above 2013.1.0.1

NOTE: After uploading the workflow to the appliance, it is not immediately visible as it is considered a 'hidden' workflow.
Engineer/FE must hold the SHIFT key and click the + next to workflow to reveal it.

This failure is caused by an issue with the programming of the SEEPROM on the motherboard of the ZS3-2.

This workflow upgrades the SEEPROM on the controller with correct contents. The workflow needs to be executed only once on each system to burn the SEEPROM contents. A reboot of the controller is required after execution of the workflow.

The following prep work is needed prior to and after execution of this workflow.

   a. If the BUI is accessible through the net0 management port, no further action is necessary.
   b. If access to the BUI is being secured through the connection to the dlpi cluster port, then ensure that at least net0 and dlpi ports are connected to the network. If BUI is not accessible, then connect the Ethernet cables to net0 into the dlpi port and check for access (verify that correct route information is present for the interface being used). Alternatively, use the shell at this point to download and run the workflow.
   c. If step b has been implemented, after running the workflow and during the appliance reboot, disconnect the ethernet cable from the dlpi port and re-attach cluster cables and Ethernet cables as configured/necessary.

Identification of Affected Parts (how to)

An affected Customer List is being maintained by the TSC.

Comments

For questions about this knowledge article please contact Renee Bennett - Director, Global Disk Technology Service Center.

References

BugID: 17842422
Other References: NCAT-7204
SR #: 3-8205121173, 3-8182189801, 3-8217496896, 3-8112969351, 3-8171301091, 3-8189956401,
3-8211070451, 3-8130877324, 3-8124053364,

Contacts

Contributor/Submitter: zuheir.totari@oracle.com, brian.sutcliffe@oracle.com, mark.rugare@oracle.com
Responsible Engineer: michael.bondurant@oracle.com
Responsible Manager: tejinder.singh@oracle.com, curtis.decotis@oracle.com
Business Unit Group: NWS (Network Storage)

Attachments

This solution has no attachment