Asset ID: |
1-71-1639070.1 |
Update Date: | 2018-03-28 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1639070.1
:
Steps For Clearing Devices in Unusable or Failing State From cfgadm After LUNs Have Already Been Removed
Related Items |
- Sun Storage FC HBA
- Emulex FC HBA
- Solaris Operating System
- Solaris Operating System
- Sun SPARC Enterprise M5000 Server
- Solaris Operating System
- Sun Storage FCoE CNA
- Qlogic FC HBA
|
Related Categories |
- PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
|
In this Document
Applies to:
Sun Storage FCoE CNA - Version All Versions to All Versions [Release All Releases]
Sun Storage FC HBA - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases]
Emulex FC HBA - Version All Versions to All Versions [Release All Releases]
Solaris Operating System - Version 11.2 to 11.2 [Release 11.0]
Information in this document applies to any platform.
Goal
This document is for Fibre Channel (FC) LUNs not iSCSI or SCSI LUNs, primarily addresses LUNs and Targets that were access via a FC Switch fabric.
For issues clearing device entries of FC LUNs from a Direct Attached Storage (DAS) unit see section "Direct Attached Storage (DAS)"
The procedures outlined here are practical when dealing with a single or small group of LUNs.
If a large number of LUNs have been removed then a reboot would be quickest and cleanest way to clear them.
If you have not yet started removing LUNs and/or Targets from Solaris go to following document first:
Best Practice "Before" Removing LUN(s) and/or Target(s) From a Solaris Server (Doc ID
1639048.1)
Before Beginning
Before attempting any of the steps in this document it is "Highly Recommended" that a explorer output be collected so there is a before and after reference. See Explorer section at the end of this document below for instructions on preserving previous explorer outputs, upgrading to latest version of explorer and collecting new explorer with appropriate options.
This document deals with LUNs that have already been "intentionally" removed from Solaris server view, but device references in cfgadm, format, /var/adm/messages, etc. still exist and attempts to clear them using cfgadm command (see section below) have not succeeded.
These issues are primarily due to LUNs device tree entries still being seen or used by one or more applications on the server.
The disk may be seen in format as:
6. c4t500601653XXXXXXXd0 <drive type unknown> /pci@1,700000/SUNW,qlc@0,1/fp@0,0/ssd@w500601653XXXXXXX,0
or
in the messages files something like this:
Sep 10 10:51:28 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@1,700000/SUNW,qlc@0,1/fp@0,0/ssd@w500601653XXXXXXX,0 (ssd50):
Sep 10 10:51:28 server01 drive offline
Any LUNs or Targets found to be in a NON-Optimal state and were not intentionally removed should be investigated before proceeding.
See doc
SAN Fibre Channel (FC) Storage Connectivity Issues (Doc ID 1502843.1)
Also recommend reviewing "How to Help Avoid These Types of Issues In The Future" section below as well.
cfgadm command
This command is used when a subset (not all) of LUNs under certain Targets have been removed from the storage side.
Normally, when a LUN is removed Solaris will change it to a "failed/failing" state in cfgadm command output then within 30 second it will change to a "unusable" state.
1. If LUNs have not changed to unusable state yet , try running the following command then check again:
cfgadm -c configure c#
c# where # is the fc hba port controller number for each path LUNs seen under
2. To view LUNs and Targets:
cfgadm -o show_FCP_dev -al (for Solaris 8/9 and above)
or
cfgadm -o show_SCSI_LUN -al (for Solaris 10 and above)
3. Once the LUN is in a "unusable" state, use this command to clear the unusable entries.
cfgadm -o unusable_FCP_dev -c unconfigure c#::WWPN (for Solaris 8/9 and above)
or
cfgadm -o unusable_SCSI_LUN -c unconfigure c#::WWPN (for Solaris 10 and above)
WWPN is the Target World Wide Port Number.
c# where # is the fc hba controller number where the WWPN is seen.
Proper command syntax does not include LUN.
Oracle Solaris 11 Information Library
man pages section 1M: System Administration Commands
cfgadm command
http://docs.oracle.com/cd/E23824_01/html/821-1462/cfgadm-fp-1m.html
...
Example 6 Removing Offlined Solaris Device Nodes for a Target Device
The following command removes offlined Solaris device nodes for a target device:
# cfgadm -c unconfigure -o unusable_SCSI_LUN c0::210000203708b606
4. Then clean device tree , by running following command:
devfsadm -Cv
If above procedure does not clear device references then proceed to Solution section in this document.
Note1. Make sure you have read and implemented "Before Beginning" section items above.
Note2. The quickest, cleanest way to resolve this is to reboot the server, if possible. Otherwise continue with Solution section below. It is sometimes (but not always) possible to determine what process (such as, Symantec Veritas Volume Manager (VxVM) or others) is holding onto the device entries in the device tree. Many times we are unable to identify the process and the server has to be reboot to clear the issue. Other times the process or applications or commands hang and unable to be kill/stopped.
Even if the device was "NOT" being used by any of the following applications, the LUNs / Targets would still need to be removed from the applications view to release device tree entries so that they can be cleared.
The System Administrator will determine which of the applications below are installed on the server.
Solution
A) FC Switch Attached Storage
NON-Oracle Applications
#1 - Symantec Veritas Volume Manager (VxVM)
#2 - EMC PowerPath Multipathing
#3 - Emulex Native Emlxadm Utility
#4 - Hitachi Multipathing HDLM
#5 - NetApps Snapdrive (other array side cloning applications as well)
Oracle Applications
#6 - Oracle/Sun Cluster
B) Direct Attached Storage (DAS)
#7 - Symantec Veritas Netbackup (NBU)
1. Symantec Veritas Volume Manager (VxVM)
Customer should engage thier NON-Oracle Symantec Veritas Volume Manager (VxVM) support for assistance.
These are the steps that should be performed:
-1- Temporarily disable Veritas monitoring software
-2- Clear the LUNs that were removed from server from "vxdisk list" output
-3- Remove LUNs from Veritas DMP Multipathing Software VxDMP
-4- Attempt luxadm -e offline of LUNs :
- Perform "luxadm -e offline /dev/rdsk/cXtXdXs2" for each path to same LUN , example
luxadm -e offline /dev/rdsk/c2t50060Exxxxxxxxxxx60s2
luxadm -e offline /dev/rdsk/c4t50060Exxxxxxxxxxx60s2
- View and Check for Change in Device Status then try to clear with cfgadm command
-5- Re-Enable Veritas Monitoring Software
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
These are steps with the vxvm commands that should be followed, BUT keep this notes internal,
Oracle has a contractual obligation to Symantec not to provide support by way of KM articles for Symantec products.
---- STEPS: ----
#1 - Temporarily Disable Veritas Monitoring Software
- Determine if Veritas Monitoring Software is running:
ps -ef | grep vxesd
If it is then disable it by running following command:
vxddladm stop eventsource
- View and Check for Change in Device Status then try to clear with cfgadm command
#2 - Clearing from "vxdisk list" Command Output
- Correlate devices that were removed from server to the devices in vxdisk list
- Remove LUNs from vxdisk list
example:
A) vxdisk list
B) vxdisk rm HDS9980V0_81
- Verify LUNs not in vxdisk list command output
- View and Check for Change in Device Status then try to clear with cfgadm command
#3 - Remove LUNs from Veritas DMP Multipathing Sofware VxDMP
example:
vxdmpadm getsubpaths ctlr=c2
vxdmpadm getsubpaths dmpnodename=emc0_0360
vxdmpadm -f disable path=c2t500604xxxxxxxxxxx0s2
- View and Check for Change in Device Status then try to clear with cfgadm command
#4 - Attempt luxadm -e offline of LUNs
- Perform for each path to same LUN
example
luxadm -e offline /dev/rdsk/c2t50060Exxxxxxxxxxx60s2
luxadm -e offline /dev/rdsk/c4t50060Exxxxxxxxxxx60s2
- View and Check for Change in Device Status then try to clear with cfgadm command
Re-Enable Veritas Monitoring Software
- To enable Veritas Monitoring Software run following command:
vxddladm start eventsource
Note: Only re-enable after all relevant application procedures have been attempted.
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
2. EMC PowerPath Multipathing
If needed engage your NON-Oracle EMC support for assistance with these steps.
Note: If the LUNs were being used by Symantec Veritas Volume Manager (VxVM) then go thru the VxVM procedure first since VxVM is above the EMC PowerPath layer.
---- STEPS: ----
#1 - powermt command
- run following EMC PowerPath Multipathing command to check the I/O Paths. It will detect a dead path and remove it from the EMC path list.
powermt check
- View and Check for Change in Device Status then try to clear with cfgadm command
#2 - Attempt luxadm -e offline of LUNs
- Perform "luxadm -e offline /dev/rdsk/cXtXdXs2" for each path to same LUN , example
luxadm -e offline /dev/rdsk/c2t50060Exxxxxxxxxxx60s2
luxadm -e offline /dev/rdsk/c4t50060Exxxxxxxxxxx60s2
- View and Check for Change in Device Status then try to clear with cfgadm command
3. Emulex Native Emlxadm Utility
If needed engage your NON-Oracle Emulex support vendor for assistance with these steps:
---- STEPS: ----
#1 - Look for Emulex Native Emlxadm Utility daemon
# svcs -a | grep elxhba
online 21:10:33 svc:/application/elxhbamgr:default
If daemon not present then select procedure for another application or do a reconfiguration reboot to clear issue.
#2 - Disable it temporarily:
# svcadm disable svc:/application/elxhbamgr:default
#3 - Verify it is disabled
# svcs -a | grep elxhba
disabled 21:59:46 svc:/application/elxhbamgr:default
- View and Check for Change in Device Status then try to clear with cfgadm command
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
4. Hitachi Multipathing HDLM
Hitachi Dynamic Link Manager (HDLM)
If needed engage your NON-Oracle Hitachi Dynamic Link Manager (HDLM) support for assistance with these steps:
---- STEPS: ----
#1 - Remove LUNs from HDLM
Use procedures in following Hitachi manual:
"Hitachi Dynamic Link Manager User’s Guide for Solaris Systems"
- View and Check for Change in Device Status then try to clear with cfgadm command
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
5. NetApps Snapdrive (other array side cloning applications as well)
NetApps has a array side snapshot feature "snapdrive" to do backups on a frequent basis. It creates a snapshot, mounts it, does backup, then destroys the snapshot.
This snapshot destroy sometimes leaves the stubborn "unusable" devices behind.
Currently no recommended steps to address these stubborn "unusable" devices.
Engage your NON-Oracle NetApps "snapdrive" support for assistance.
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
6. Oracle/Sun Cluster
If needed open a Service Request (SR) for Oracle Solaris Cluster support for assistance with these steps:
---- STEPS: ----
#1 - Remove LUNs from Solaris Cluster
Use procedures in following Oracle Solaris Cluster knowledge document:
Solaris Cluster 3.x: cfgadm fails to offline a Solaris Cluster Disk, cfgadm unconfigure failed to offline device, busy (Doc ID 1008145.1)
- View and Check for Change in Device Status then try to clear with cfgadm command
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
7. Symantec Veritas Netbackup (NBU)
Customer should engage thier NON-Oracle Symantec Veritas Volume Manager (VxVM) support for assistance.
These are the steps that should be performed:
-1- Comment out (or remove) the offending entries from sg.conf
-2- Rescan sg Driver :
- run following command:
update_drv -fv sg
devfsadm -Cv
- View and Check for Change in Device Status then try to clear with cfgadm command
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
These are steps with the vxvm driver changes that should be followed, BUT keep this notes internal,
Oracle has a contractual obligation to Symantec not to provide support by way of KM articles for Symantec products.
---- STEPS: ----
#1 - Comment out (or remove) the offending entries from sg.conf
- cd /kernel/drv/
- edit sg.conf file
Comment out (or remove) sg entries for removed LUNs.
Example:
name="sg" parent="fp" target=0 lun=0 fc-port-wwn="20130xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=1 fc-port-wwn="20130xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=0 fc-port-wwn="20330xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=1 fc-port-wwn="20330xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=0 fc-port-wwn="20320xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=1 fc-port-wwn="20320xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=0 fc-port-wwn="20120xxxxxxxx3de";
name="sg" parent="fp" target=0 lun=1 fc-port-wwn="20120xxxxxxxx3de";
#2 - Rescan sg Driver
- run following command:
update_drv -fv sg
devfsadm -Cv
- View and Check for Change in Device Status then try to clear with cfgadm command
If steps above did not clear device then select procedure for another application or do a reconfiguration reboot to clear issue.
How to Help Avoid These Types of Issues In The Future
- Overview/Disclaimer
There is not going to be any one single universal way to remove devices. There are too many variables that affect the equation. So Oracle support can only offer generic, best effort advice especially given number of NON-Oracle Third-Party Applications that can be involved.
First, fabric devices never unconfigure themselves. If going to remove device(s) from the system (be they a single lun, multiple luns, or an entire target) some iteration of 'cfgadm' will need to be run to clear them from the system cleanly.
Secondly, before removal it is often required to take additional steps to remove the device(s) from whatever application was using them or can view or is aware of them.
This is by far the most common problem/misunderstanding seen when removing luns.
When adding luns, it is clear that it is a "bottom up" operation. The very first thing is that the lun is discovered by Solaris, it gets configured, and then it can be put to use by various applications.
The opposite is true when removing luns. It should be viewed as a "top down" approach. The lun has to be released by it's application(s) first, then and only then will Solaris be able to unconfigure it and clean up device entries.
This could be as simple as unmounting a filesystem, or could be things like VxVM, Netbackup, Cluster software, Oracle, ZFS, SVM, HDLM, Powerpath, etc. We cannot predict all the applications that could be involved nor how exactly one would safely remove the lun(s) from use in all of those possible applications.
Too often devices are just yanked away first and then try to clean up the device tree after the fact. Sometimes the issue can be worked around and cleaned up, sometimes it cannot and will require a reboot to clear issue.
- Keeping Solaris up to date on patches and/or SRU level for Solaris 11 and beyond
Ongoing enhancements to Solaris Operating System are helping to avoid and/or resolve these type issues. Keeping Solaris up to date will help prevent these issue from occurring.
- Employ Best Practices
see doc
Best Practice "Before" Removing LUN(s) and/or Target(s) From a Solaris Server (Doc ID 1639048.1)
Oracle Explorer Data Collector
If you have problems , escalate this to Oracle support and upload:
- If you do have previous explorer file upload that to the SR
- Upgrade to latest explorer version if possible , collect a explorer, and upload that to the SR
See:
Oracle Explorer Data Collector - Product Information Center (Doc ID 1312847.1)
Oracle Support File Uploads (Doc ID 1547088.2)
Fibre Channel (FC) SAN : How To Collect and Send Explorer Data to Oracle SAN Support (Doc ID 1273941.1)
Attachments
This solution has no attachment