![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2184100.1 : SPARC M7 Series Servers : ZFS fault on eUSB disk after CMIOU replacement
In this Document
Applies to:SPARC M7-8 - Version All Versions and laterOracle SuperCluster M7 Hardware - Version All Versions and later SPARC M7-16 - Version All Versions and later Information in this document applies to any platform. SymptomsAfter replacing a CMIOU on SPARC M7 servers or SPARC SuperCluster M7 servers, if iSCSI over IPoIB is used as a boot option then the domain owning the eUSB disk from the CMIOU replaced might complain with a ZFS fault when restarting the domains. For instance : SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Thu Sep 15 22:03:09 PDT 2016 PLATFORM: unknown, CSN: unknown, HOSTNAME: SOURCE: zfs-diagnosis, REV: 1.0 EVENT-ID: 21ba6416-f79f-4b70-9059-e1c705462754 DESC: ZFS device 'id1,sd@SMICRON__eUSB_DISK_______17F0022700070705/a' in pool 'bpool' failed. AUTO-RESPONSE: No automated response will occur. IMPACT: Fault tolerance of the pool may be compromised. REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please e document at http://support.oracle.com/msg/ZFS-8000-D3 for the latest service procedures and policies regarding this diagnosis. 288 Thu Sep 15 22:03:10 2016 Fault Repair minor
Fault fault.fs.zfs.device on component - cleared 287 Thu Sep 15 21:58:52 2016 Fault Fault critical Fault detected at time = Thu Sep 15 21:58:35 2016. The suspect component: - has fault.fs.zfs.device with probability=100. Refer to http://support.oracle.com/msg/ZFS-8000-D3 for details See the documents referenced below for further information about iSCSI over IPoIB. Replacing a CMIOU does not require to transfer the eUSB from the suspect to the new CMIOU unless the eUSB disk is the only device in the boot pool.
CauseThis situation may be due to the eUSB disk having an incorrect/missing label. In such a case, the eUSB disk may fail to join the boot pool. From the domain owning the eUSB disk and using it as part of the boot pool to boot using iSCSI over IP over IB. The domain owning the eUSB disk can be the control/primary domain or any guest domain. Make sure to identify from which ldom the fault is coming from. The fault is proxied from the guest ldom to the control/primary domain. The hostid for the domain where the fault was diagnosed is reported in the 'fmadm faulty' output. See IO faults proxying in LDOM environment (Doc ID 1942045.1) # fmdump Sep 15 21:58:35.1780 e8a130c0-90b6-4461-aadf-df9502fb85a9 ZFS-8000-D3 Diagnosed ... Problem in: zfs://pool=3794d1209385ba27/vdev=a8e0f7cc0efd0465/pool_name=bpool/vdev_name=id1,sd@SMICRON__eUSB_DISK_______17F0022700070705/a
# fmadm faulty --------------- ------------------------------------ -------------- --------- Problem Status : open ---------------------------------------- FRU Description : ZFS device 'id1,sd@SMICRON__eUSB_DISK_______17F0022700070705/a' Response : No automated response will occur. Impact : Fault tolerance of the pool may be compromised. Action : Use 'fmadm faulty' to provide a more detailed view of this event. From the ldom reporting the fault, check the error as reported and details for the fault ('fmadm faulty', 'fmdump -v', 'fmdump -eV'). The UUID must be the same as the one proxied to the control domain. # fmdump -e Sep 15 21:56:53.8380 ereport.fs.zfs.vdev.bad_label # fmdump -eV TIME UUID SUNW-MSG-ID TIME CLASS ENA nvlist version: 0 mod-name = zfs-diagnosis fault-list-sz = 0x1 fru = (embedded nvlist) resource = (embedded nvlist)
The boot pool of the ldom owning the eUSB disk is reported as degraded in the 'zpool status' output : # zpool status -v pool: bpool config: NAME STATE READ WRITE CKSUM device details: 12168998648951932005 UNAVAIL was /dev/dsk/c2t0d0s0
# bootadm boot-pool list
SolutionIf due to an incorrect/missing label, the disk must be properly labelled. There is no need to replace the eUSB disk.
The fault on the control/primary is providing the information about where the fault is coming from. In this example : host-id = 84f9adb4. Use the 'ldm ls-dom -l' command from the control domain to locate the ldom and eUSB disk. CMIOU2 was replaced previously in this example. See SPARC M7 Series Servers: Device Paths (Doc ID 2063247.1) to identify the path and bus/rootcomplex. # ldm ls-dom -l NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME ... HOSTID ... IO # ldm list-rsrc-group -l /SYS/CMIOU2 ...
IO
The eUSB can only be managed from the ldom owning the resource. After logging into the identified ldom (ssccn1-dom3 in the previous example), the bpool is reported as degraded from this ldom. # zpool status -v pool: bpool config: NAME STATE READ WRITE CKSUM ... device details: 12168998648951932005 UNAVAIL was /dev/dsk/c2t0d0s0
From the previously identified ldom, label the eUSB disk #format...
6. c2t0d0 <MICRON-eUSB DISK-1112-1.89GB>
/pci@30e/pci@2/usb@0/storage@1/disk@0,0 7. c3t0d0 <MICRON-eUSB DISK-1112-1.89GB> /pci@313/pci@2/usb@0/storage@1/disk@0,0 # format c2t0d0
format> label yes format> quit Then the respective device can detached and re-attached to the pool. # zpool detach bpool 12168998648951932005 # zpool status -v pool: bpool config: NAME STATE READ WRITE CKSUM Make sure via FMA commands that no other fault exist related to this eUSB disk and bpool.
References<NOTE:2094649.1> - SPARC T7 / M7 Servers : How to install Solaris on a Physical Domain using VersaBoot - iSCSI over IPoIB<NOTE:2094741.1> - SPARC T7 / M7 / M8 Servers : Information about VersaBoot - iSCSI over IPoIB <NOTE:2107700.1> - SPARC M7 Servers : iSCSI over IPoIB - CMIOU/eUSB replacement considerations Attachments This solution has no attachment |
||||||||||||||||||
|