Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1596826.1
Update Date:2017-08-08
Keywords:

Solution Type  Troubleshooting Sure

Solution  1596826.1 :   ODA Disk Issue: Suggested Common Disk Diagnostic Commands or Scripts with Tips, Best Practice and Examples  


Related Items
  • Oracle Database Appliance X4-2
  •  
  • Oracle Database Appliance
  •  
  • Oracle Database Appliance Software
  •  
  • Oracle Database Appliance X3-2
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  


As of ODA / Oak 2.10, the _oakcli stordiag ..._ command  is the single-source of truth for various   disk conditions,  disk diagnostics and trouble-shooting:
For older versions (pre-2.10) there are many different approaches to disk based problems including multiple diagnostics and commands. However, not all commands are as they seem in providing the information required to troubleshoot ODA + Disk based problems. This is a living document intended to catalog some of the more commonly used commands, provide examples and a synposis of usefulness depending on the situation. The current usage of the note is more for casual review vs. rigorous trouble-shooting as many new commands are found and cataloged - therefore internal only at this time

In this Document
Purpose
Troubleshooting Steps
 Disk Commands for Trouble Shooting
 odasundiag
 oakcli
 fw and disk
 mapper
 mpath
 OS Disks
 oakcli
 mdstat
 mdadm
 fwupdate
 lsscsi  
 JBOD / Storage Shelf
References


Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Oracle Database Appliance Software - Version 2.1.0.1 to 12.1.2.7 [Release 2.1 to 12.1]
Oracle Database Appliance X4-2 - Version All Versions to All Versions [Release All Releases]
Oracle Database Appliance X3-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Purpose

To assist with commands that can be used for debugging ODA disk issues
- Intended to help differentiate between various usage and problem scenarios.

Please use the following note for required Disk Diagnostic information

Note 1390058.1 - Oracle Database Appliance Diagnostic Information required for Disk Failures

 

MINIMUM DISK REPLACEMENT DIAGNOSTICS

 

If Oakd is running on both nodes:

  • oakcli show env_hw
  • /root/Extras/odasundiag.sh    < DiskDiag
  •  oakcli stordiag e#_pd_S#        < where e# is enclosure 0 or 1 and S# is Slot number  *

If more than one disk

  • oakcli manage diagcollect --storage


For any PROBLEM disk whether identified by the above outputs or from errors found in trace files or the ASM alert.logs:

oakcli stordiag  outputs

If you are using X3-2 or X4-2 or higher then upload  e< 0|1 >_pd_<slot#> 

Example for ODA X3-2, X4-2 or X5-2 
 

oakcli stordiag e0_pd_02                  and if the disk is the SECOND JBOD use  e1_pd_02


If you are using V1 then upload oakcli stordiag      pd_<slot#>
                 
Example for ODA V1                  

 oakcli stordiag pd_02   



  


Suggested:

Helps confirm the scope of the problem impacting ASM

  • ASM Alert.logs for both nodes  < Included with oakcli manage diagcollect --storage


Supplemental information for diagnosing disk problems that may or may not require disk replacement

  • oakcli disk information

    oakcli show diskgroup DATA
    oakcli show diskgroup RECO
    oakcli show diskgroup REDO

 

If you cannot run OAK 

  •  ASM Alert.logs

    Node0 /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
    Node1 /u01/app/grid/diag/asm/+asm/+ASM2/trace/alert_+ASM2.log


Both nodes  as root  

  • ./odasundiag.sh

 

 TIP: Before adding a disk back into the ODA, wait until you can confirm the old disk has been removed.
       You can do this by checking removal times in the OAKd.log

  • grep "desc:" oakd.log

 

  

 

 

 

 

1) confirm the HW version and deployment type ( BM or ODAVP)

    confirm that both nodes are patched up to the proper level AND matching
    - Especially important that Shared Disk show as consistent and patched up to date (or at a version higher than expected)

2) Stordiag runs several check on EACH NODE for the disk - this is considered the best 'single source of truth' as of 2.10 and higher outside HW checks
    Stordiag also provides the most detailed analysis of FW, ASM, LSI multipath, and other factors important to disk health and status


Example:
    oakcli stordiag pd_11                        << In this Example the Number 11 is based on the information that we are reviewing Disk / Slot 11 for replacement

Example:
    oakcli stordiag e1_pd_11                    << If you have more than one JBOD / Storage shelf the e[0 | 1] will depend on the JBOD number - use the ASM ALERT.LOG to confirm the exact disk name

 

3) ASM is critical for the OVERALL ODA STABILITY and HEALTH -

One disk may be looked at, but if replacing that disk exceeds the redundancy of the ASM Diskgroups a much more serious problem can occur bringing down the database or worse.

IMPORTANT: Inspect BOTH NODEs ASM alert.logs to confirm
 a. ASM is up and running on both nodes
 b. ALL DISKGROUPS  DATA | RECO and REDO are up and runing and online
 c. The number of bad or offlined disks does not exceed the REDUNDANCY Rules

4) Checking on BOTH nodes may quickly confirm if the problem is for more than a single disk and if isolated to a single node.
   Problems showing on only a single-node tend to be FALSE POSITIVES.

5) DISKDIAG is used by both EEST and HW and should always be one of the first pieces of information collected

 

Troubleshooting Steps

 

On ODA, with the latest releases, we depend on OAK as the single source of truth for any shared disk failures.

Therefore

  1. Case A ==> IF you are using an older oak release and OAK reports failures, then make sure that you upgrade to latest OAK releases 2.10
  2. Case B ==> As of 2.10, we dont have smartctl running. You can and should ignore smartctl errors and messages in pre-2.10 versions.
  3. Case C ==> OAK reports a smartctl failure in 2.10, then we will request that our HW team is involved and help customer with the issue.

It is possible in the above case that there is a physical failure but some older versions of OAK could not detect this:
- For this reason we strongly recommend getting to 2.10 and as appropriate open a SR with any questions you may have.

  

Disk Commands for Trouble Shooting

odasundiag

./odasundiag.sh 

oakcli

  oakcli  commands are issued as root and can be executed from /opt/oracle/oak/bin/

# oakcli manage diagcollect --storage                     --
# oakcli stordiag e#_pd_XX                                --  use pd_XX for ODA V1   ..e#_pd_XX  where e# is e0 or e1 and "XX" is the slot ## for enclosure 0 or 1
# oakcli show env_hw                                      -- Confirms HW version plus if on ODAVP where you are issuing the command from 
# oakcli validate -c StorageTopology
# oakcli validate -c SharedStorage
# oakcli show validation storage    
# oakcli validate -c OSDiskStorage
# oakcli validate -v -c OSDiskStorage  -- for Boot / System disks
# oakcli show validation storage -errors  -- 2.8+     Shows hard failure errors
# oakcli show storage -errors             -- 2.7     Shows hard failure errors
# oakcli show diskgroup                                  -- Should List the three groups
# oakcli show diskgroup    [ DATA | RECO | REDO ]        -- Select one of the three groups to get individual disk group disk details
 
# oakcli show validation storage failures  grep < disk resource name> (Individual resource failure)
# oakcli show validation storage failures            -- Shows ALL soft errors

# oakcli addasmdisk ....

fw and disk

# fwupdate list disk                -- Lists SYSTEM disks + Disk ID, Slot, Size, FW version, plus controller information   
-- Not supported on V1                  plus all disks: Name, Path, Type, state, state_details; ID, chassis,slot,type
# smartctl -a /dev/s...
  
ASM

SQLPLUS>    gv$asm_disk | gv$asm_diskgroup | gv$_asm_operation
# view /opt/oracle/extapi/asmappl.config              -- both nodes
asmcmd> lsdg
asmcmd> volinfo --all
asmcmd> lsof
asmcmd> volenable --all
kfed read /dev/mapper/..D_E......[ p1 or p2 ]                 --  optional | head -53

mapper

 ls -l /dev/mapper/*D*
   
More from - /dev/mapper
# /dev/mapper ls -al     - By Name
# /dev/mapper ls -altr   - By Time
# ls -altr *S2*          -- To check SSDs only
 # ls -al /dev/mapper/*D*

mpath

# ls -al /dev/mpath

OS Disks

oakcli
# /opt/oracle/oak/bin/oakcli validate -v -c OSDiskStorage
  
mdstat
# cat /proc/mdstat
mdadm
 mdadm --detail /dev/md0
 mdadm --detail /dev/md1
fwupdate
# fwupdate list all           -- Shows versions for disks, FW,
# fwupdate list disk          -- Lists SYSTEM disks + Disk ID, Slot, Size, FW version, plus controller information
                                 plus all disks: Name, Path, Type, state, state_details; ID, chassis,slot,type
 
lsscsi  

[root@oda1 ~]# lsscsi -v | grep 600G                                ----- For the X3-2 this is issued on BM or DOM0 if using ODAVP

e.g.
[2:0:0:0] disk HITACHI H109060SESUN600G A606 /dev/sdcs
[2:0:1:0] disk HITACHI H109060SESUN600G A606 /dev/sdct

Disk Diag request bundled commands

 

Use the following for a data collection when you are diagnosing general Storage and disk issues
and not problems which appear to a specific disk UNLESS the problem is with a replacement disk

Please send the following files:

/opt/oracle/oak/onecmd/tmp/*
/etc/multipath.conf,
/opt/oracle/extapi/asmappl.config,
/opt/oracle/oak/log/<HOSTNAME>/oak/oakd* <-- make sure you place your hostname in the path, if this path was created
/opt/oracle/oak/log/test/oak/oakd*
/opt/oracle/oak/conf/validation_props.xml file
/var/log/messages*

Output from the following commands:
multipath -l
ls -l /dev/mapper/*
fwupdate list all
oakcli validate -c storagetopology
oakcli show version -detail
  
 

JBOD / Storage Shelf

# fwupdate list expander  -- Confirms Shelf/JBOD plus -- #ID, Chassis, Slot, Expander Name, FW Version, Manufacturer, Model(e.g.DE2-24P)  
 
# oakcli validate -c SharedStorage
 

Disk Layouts

Note that the ODA X4-2 and X3-2 both use the same DE2-24P Storage Shelf

X3-2 and X4-2 DE2-24P Storage Shelf   plus optional shelf for X3-2 and higher     V1 storage built into ODA HW

ALSO useful

  • ./odasundiag.sh      -- the best single source collection -- see
  •  oakd.log
  •  +ASM1 and +ASM2  ALERT.LOGs 
  •  fwupdate list     [ disk | controller | all ]  


Note: The information here in this Note provides some context and examples for various ODA commands:
Perhaps the best information to diagnose ODA Disk issues can be extracted using various steps and commands from the following Notes


oakcli show storage
will only list the controller if it see's the disks attached.

 

oakcli stordiag <resource_name>

 Usage:      oakcli stordiag -h | n
                      -h : Help Message
                       n : oakd disk resource name
                       resource name format :   pd_[0..23]  
                      
for X5-2; X4-2 or X3-2 use: e[ 0 | 1]_pd_[0..23]  
 e.g. for the first JBOD Enclosure

                                                  


Example :

[root@odarm1 ~]# oakcli stordiag pd_01                     -- newer HW version x5-2 X4-2 or X3-2 versions will use oakcli stordiag e0_pd_## for the first JBOD and e1_pd_##  
                                                                                           where e0 is the first JBOD enclosure and e1 is the second JBOD enclosure and ## is the disk number 00  up to 23


  Node Name : odarm1
   Test : Diagnostic Test Description

   1  : OAK Check
        NAME             PATH               TYPE             STATE           STATE_DETAILS
        pd_01            /dev/sdaw       HDD             ONLINE          Good

   2  : ASM Check
        ASM Disk Status                        :  state   mode_s  mount_s header_s

   3  : Smartctl Health Check
        SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5 [asc=5d, ascq=5]

   4  : Multipathd Status
        multipathd running on system

   5  : Multipath Status
        Device List : /dev/sdm   /dev/sdaw
        Info:
             HDD_E0_S01_975092811 (35000c5003a1ebc4b) dm-14 SEAGATE,ST360057SSUN600G
             size=559G features='0' hwhandler='0' wp=rw
             |-+- policy='round-robin 0' prio=1 status=active
             | `- 6:0:10:0 sdm  8:192  active ready running
             `-+- policy='round-robin 0' prio=1 status=enabled
               `- 7:0:23:0 sdaw 67:0   active ready running
        IO Test Result:
                      /dev/sdm  : PASS
                      /dev/sdaw : PASS

   6  : Check Partition using fdisk
        Check using active device path: /dev/sdm
        Partition check on device /dev/sdm  :  FAIL
        Partition list found by fdisk for active device path: /dev/sdm
                Device Boot      Start         End      Blocks   Id  System

        Check using passive device path: /dev/sdaw
        Partition check on device /dev/sdaw  :  FAIL

        Partition list found by fdisk for passive device path: /dev/sdaw
                 Device Boot      Start         End      Blocks   Id  System

   7  : Device Mapper Diagnostics
        Mapper Device : dm-14
        IO Test Result:
                      /dev/dm-14 : PASS
          [INFO]: No partition seen in /dev/mapper directory

   8  : fwupdate
        ID        Manufacturer   Model                Chassis Slot   Type   Media       Size (GB)       FW Version XML Support
        c1d1      SEAGATE        ST360057SSUN600G        0       1      sas   HDD     600             0B25            N/A
        c2d1      SEAGATE        ST360057SSUN600G        0       1      sas   HDD     600             0B25            N/A

   9  : Fishwrap
          Controller "mpt2sas:0d:00.0"
                Disk  /dev/sdm: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG        6SL0SQHG"), bay 1
          Controller "mpt2sas:1f:00.0"
                Disk  /dev/sdaw: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG        6SL0SQHG"), bay 1

  10  : SCSI INQUIRY
        Active multipath device /dev/sdm      :  PASS
        Passive multipath device /dev/sdaw  :  PASS

  11  : Multipath Conf for device
           multipath {
             wwid 35000c5003a1ebc4b
             alias HDD_E0_S01_975092811
       }

  12  : Last five LSI Events Received for slot 1
        oakd.l02-2013-08-23 22:28:10.255: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C0
        oakd.l02: desc: Predictive failure: PD 0a(e2/s1)
        oakd.l02-2013-08-23 22:28:10.315: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C0
        oakd.l02: desc: Predictive failure: PD 0a(e2/s1)
        oakd.l02-2013-08-23 22:50:13.752: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C1
        oakd.l02: desc: Predictive failure: PD 17(e3/s1)
        oakd.l02-2013-08-23 22:50:13.753: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C1
        oakd.l02: desc: Predictive failure: PD 17(e3/s1)

  13  : Version Information
          OAK                   :  2.7.0.0.0
          kernel                :  2.6.39-400.111.1.el5uek
          mpt2sas             :  16.05.01.00
          Multipath            :  0.4.9
          Disk Firmware    :  0B25

  14  : OAK Conf Parms
        Device    : queue_depth     Timeout         max_sectors_kb  nr_requests     read_ahead_kb   scheduler
       /dev/sdm :     32                      32               1024                            4096                 128            noop [deadline] cfq
      /dev/sdaw :     32                      32               1024                            4096                 128            noop [deadline] cfq


          ******************************
          ********** 2nd NODE **********
          ******************************


The authenticity of host '192.168.16.25 (192.168.16.25)' can't be established.
RSA key fingerprint is dd:5b:37:cc:85:6b:b1:c4:8e:80:66:27:7f:b1:37:23.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.16.25' (RSA) to the list of known hosts.
 Node Name : odarm2
 Test : Diagnostic Test Description

   1  : OAK Check
        NAME              PATH               TYPE             STATE            STATE_DETAILS
        pd_01           /dev/sdaw        HDD             ONLINE          Good

   2  : ASM Check
        ASM Disk Status                        :  state   mode_s  mount_s header_s

   3  : Smartctl Health Check
        SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5 [asc=5d, ascq=5]

   4  : Multipathd Status
        multipathd running on system

   5  : Multipath Status
        Device List : /dev/sdm   /dev/sdaw
        Info:
             HDD_E0_S01_975092811 (35000c5003a1ebc4b) dm-14 SEAGATE,ST360057SSUN600G
             size=559G features='0' hwhandler='0' wp=rw
             |-+- policy='round-robin 0' prio=1 status=active
             | `- 6:0:10:0 sdm  8:192  active ready running
             `-+- policy='round-robin 0' prio=1 status=enabled
               `- 7:0:23:0 sdaw 67:0   active ready running
        IO Test Result:
                      /dev/sdm  : PASS
                      /dev/sdaw : PASS

   6  : Check Partition using fdisk
        Check using active device path: /dev/sdm
        Partition check on device /dev/sdm  :  FAIL
        Partition list found by fdisk for active device path: /dev/sdm
                Device Boot      Start         End      Blocks   Id  System
        Check using passive device path: /dev/sdaw
        Partition check on device /dev/sdaw  :  FAIL
        Partition list found by fdisk for passive device path: /dev/sdaw
                 Device Boot      Start         End      Blocks   Id  System

   7  : Device Mapper Diagnostics
        Mapper Device : dm-14
        IO Test Result:
                      /dev/dm-14 : PASS
          [INFO]: No partition seen in /dev/mapper directory

   8  : fwupdate
        ID        Manufacturer   Model                Chassis Slot   Type   Media       Size (GB) FW Version XML Support
        c1d1      SEAGATE        ST360057SSUN600G    0           1      sas    HDD        600       0B25       N/A
        c2d1      SEAGATE        ST360057SSUN600G    0           1      sas    HDD        600       0B25       N/A

   9  : Fishwrap
          Controller "mpt2sas:0d:00.0"
                Disk  /dev/sdm: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG        6SL0SQHG"), bay 1
          Controller "mpt2sas:1f:00.0"
                Disk  /dev/sdaw: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG        6SL0SQHG"), bay 1

  10  : SCSI INQUIRY
        Active multipath device /dev/sdm  :  PASS
        Passive multipath device /dev/sdaw  :  PASS

  11  : Multipath Conf for device
           multipath {
             wwid 35000c5003a1ebc4b
             alias HDD_E0_S01_975092811
       }

  12  : Last five LSI Events Received for slot 1
        oakd.l02-2013-08-23 22:28:10.240: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C0
        oakd.l02: desc: Predictive failure: PD 0a(e2/s1)
        oakd.l02-2013-08-23 22:28:10.247: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C0
        oakd.l02: desc: Predictive failure: PD 0a(e2/s1)
        oakd.l02-2013-08-23 22:28:37.846: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C1
        oakd.l02: desc: Predictive failure: PD 17(e3/s1)
        oakd.l02-2013-08-23 22:28:37.847: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C1
        oakd.l02: desc: Predictive failure: PD 17(e3/s1)

  13  : Version Information
          OAK              :  2.7.0.0.0
          kernel           :  2.6.39-400.111.1.el5uek
          mpt2sas          :  16.05.01.00
          Multipath        :  0.4.9
          Disk Firmware    :  0B25

  14  : OAK Conf Parms
        Device : queue_depth     Timeout           max_sectors_kb    nr_requests     read_ahead_kb   scheduler
      /dev/sdm :      32                 32                                  1024           4096                  128            noop [deadline] cfq
      /dev/sdaw :     32                 32                                  1024           4096                  128            noop [deadline] cfq

        Above details can also be found in log file=/opt/oracle/oak/log/odarm1/stordiag/stordiag-2013-10-31-11:01:13.log
[root@odarm1 ~]#

 

 For     ODA V1:       oakcli show disk pd_##     < where # is the Physical Disk (pd) number   -- Also good for a quick diagnostic at the single disk level
 X5-2,X4-2 X3-2:     oakcli show disk e0_pd_##  or e1_pd_##  for disks on the second JBOD   

 Comment

 Oracle Application Kit Command Line Interface (oakcli) is exclusively used ODA configuring, installing, 
 patching and administration 

Oakcli commands are the preferred methods for most all ODA maintenance and administration including the creation of databases, 
and mandatory for the import of VM templates and patching or upgrades

 

oakcli show disk pd_xx  
 - where pd_xx is the name of the resource  

 

odasundiag.sh

 
odasundiag.sh 


Example Output from script:



 

Useful for debugging aspects of the problem -- usually after initial problem is at least partially understood

ls -l /dev/mapper/*D*         -- Generic: can be used to list all disks
ls -l /dev/mapper/HDD*        -- HDD only
ls -l /dev/mapper/SSD*        -- SSD only

  

Resource: pd_01
        ActionTimeout        :       600
        ActivePath           :       /dev/sdaw
        AsmDiskList          :       |data_01||reco_01|
        AutoDiscovery        :       1
        AutoDiscoveryHi      :       |data:43:HDD||reco:57:HDD||redo:100 :SSD|
        CheckInterval        :       300
        ColNum               :       1
        DiskId               :       35000c5003a1ebc4b
        DiskType             :       HDD
        Enabled              :       0
        ExpNum               :       0
        IState               :       0
        Initialized          :       0
        MonitorFlag          :       0
        MultiPathList        :       |/dev/sdaw||/dev/sdm|
        Name                 :       pd_01
        NewPartAddr          :       0
        OSUserType           :       |userType:Multiuser|
        PrevState            :       3
        PrevUsrDevName       :
        SectorSize           :       512
        SerialNum            :       001116E0SQHG
        Size                 :       600127266816
        SlotNum              :       1
        State                :       Online
        StateChangeTs        :       1382641202
        StateDetails         :       Good
        TotalSectors         :       1172123568
        TypeName             :       0
        UsrDevName           :       HDD_E0_S01_975092811
        gid                  :       0
        mode                 :       660
        uid                  :       0

 

oakcli show disk

Not Recommended : Potentially misleading

oakcli show disk                             -- Quick confirmation of physical disks known by oakcli - if the disk is physically removed you would
                                                         might confirm a missing reference and a gap in the Physical Disk identifiers.
                                                         However, if you did not know that the range is pd_00 up to pd_23 you might miss one or more
                                                         disks at the beginning or end of the range
                                                         Worse case is you might use this command and believe that this confirms not problems with any
                                                         of the disks being used: This is not a good query to determine health of the disk.
                                                         This command appears to identify if a disk is in the slot and recognized as physically existing



Of moderate usefulness :  - can be used for supplemental diagnosis

ls -l /dev/mapper/HDD*        -  Lists HHD disks but gives no explicit warning of a missing disk

 

 Use

                                                         No evidence of a problem source
                                                         Output is not consecutive so not easy to spot what disk if any is missing
                                                         Does not list SSDs
                                                         Can be used to provide details on the current detected mapped disks including
                                                         permissions ;
                                                         Group | User owner ;   -- Should be grid | asmadmin
                                                         Date disk first created(?) | Date mapped post ASM (?) ;
                                                         device name , path(s) and node#

Example:

 ls -l /dev/mapper/*D*     

  

Good

[root@odax3rm1 ~]#  ls -l /dev/mapper/*D*
brw-rw---- 1 grid asmadmin 252,  23 Oct 24 11:59 /dev/mapper/HDD_E0_S00_372932264
brw-rw---- 1 grid asmadmin 252,  99 Nov  1 02:03 /dev/mapper/HDD_E0_S00_372932264p1
brw-rw---- 1 grid asmadmin 252, 100 Nov  1 02:03 /dev/mapper/HDD_E0_S00_372932264p2
brw-rw---- 1 grid asmadmin 252,   7 Oct 24 12:00 /dev/mapper/HDD_E0_S01_373745920
brw-rw---- 1 grid asmadmin 252,  74 Nov  1 02:03 /dev/mapper/HDD_E0_S01_373745920p1
brw-rw---- 1 grid asmadmin 252, 101 Nov  1 02:03 /dev/mapper/HDD_E0_S01_373745920p2
...

Bad

[root@odarm1 ~]#  ls -l /dev/mapper/**D*
brw-rw---- 1 grid asmadmin 252, 26 Sep 10 19:17 /dev/mapper/HDD_E0_S00_975281119
brw-rw---- 1 grid asmadmin 252, 30 Oct 31 11:49 /dev/mapper/HDD_E0_S00_975281119p1
brw-rw---- 1 grid asmadmin 252, 33 Oct 31 11:21 /dev/mapper/HDD_E0_S00_975281119p2
brw-rw---- 1 grid asmadmin 252, 14 Oct 17 23:18 /dev/mapper/HDD_E0_S01_975092811   << only one of three reference to slot S01: We are missing p1 p2
brw-rw---- 1 grid asmadmin 252, 27 Sep 10 19:17 /dev/mapper/HDD_E0_S04_975101159   << Notice that this is not sequential : We go from S00; S01 (partial); skip S02 and S03, and then reference S04
brw-rw---- 1 grid asmadmin 252, 35 Oct 31 11:49 /dev/mapper/HDD_E0_S04_975101159p1
brw-rw---- 1 grid asmadmin 252, 37 Oct 31 11:21 /dev/mapper/HDD_E0_S04_975101159p2
...



Alternative

 ls -l /dev/mapper/**D*   -- will also include the Solid State Disks which are used for online logfiles
                                            similar to above but also will include

...
...
brw-rw---- 1 grid asmadmin 252,  4 Oct 17 23:18 /dev/mapper/SSD_E0_S20_805650933 -- mising p1
brw-rw---- 1 grid asmadmin 252, 12 Oct 17 23:18 /dev/mapper/SSD_E0_S21_805650925
brw-rw---- 1 grid asmadmin 252, 51 Oct 17 23:18 /dev/mapper/SSD_E0_S21_805650925p1
brw-rw---- 1 grid asmadmin 252, 10 Sep 10 19:17 /dev/mapper/SSD_E1_S22_805650984
brw-rw---- 1 grid asmadmin 252, 56 Oct 31 11:49 /dev/mapper/SSD_E1_S22_805650984p1
brw-rw---- 1 grid asmadmin 252,  5 Oct 17 23:18 /dev/mapper/SSD_E1_S23_805622321
brw-rw---- 1 grid asmadmin 252, 50 Oct 17 23:18 /dev/mapper/SSD_E1_S23_805622321p1

 

Doing a Count of the DISKS


ls -l /dev/mapper/*D* |wc -l

Of moderate usefulness :  - can be used for supplemental diagnosis

ls -l /dev/mapper/HDD* |wc -l    -- will give a count of HD disks known by the mapper for that node
                                                                      the count can be easily compared with known count for a good functioning configuration
ls -l /dev/mapper/HDD* |wc -l
58    -- check for validity


Example from X3-2 with a Second JBOD

[root@odax ~]#  ls -l /dev/mapper/HDD* |wc -l
120


[root@odax ~]#  ls -l /dev/mapper/*D* |wc -l
136

However, the above does not provide any information pointing to the problem with the MISSING / SSD in Slot 23 ( the last slot) or which JBOD the disk is in

[root@odax3rm1-net1 ~]# oakcli show diskgroup redo
        ASM_DISK        PATH                                            DISK            STATE           STATE_DETAILS

        e0_redo_20      /dev/mapper/SSD_E0_S20_805852554p1              e0_pd_20        ONLINE          Good
        e0_redo_21      /dev/mapper/SSD_E0_S21_805852541p1              e0_pd_21        ONLINE          Good
        e0_redo_22      /dev/mapper/SSD_E0_S22_805852510p1              e0_pd_22        ONLINE          Good
        e0_redo_23      /dev/mapper/SSD_E0_S23_805852551p1              e0_pd_23        ONLINE          Good
        e1_redo_20      /dev/mapper/SSD_E1_S20_805861591p1              e1_pd_20        ONLINE          Good
        e1_redo_21      /dev/mapper/SSD_E1_S21_805861578p1              e1_pd_21        ONLINE          Good
        e1_redo_22      /dev/mapper/SSD_E1_S22_805861570p1              e1_pd_22        ONLINE          Good
        e1_redo_23      /dev/mapper/SSD_E1_S23_805820183p1              e1_pd_23        FAILED          DiskRemoved

  

 

Examples for V1:

[root@oda1 ~]# ls -l /dev/mapper/*D* |wc -l

64

 

[root@oda1 ~]#  ls -l /dev/mapper/HDD* |wc -l

   << we are missing disks: but not details are provided which disks or slots or state

 


ODA metrics can infer Inconsistent or contradictory Disk state

[root@odarm1 ~]# oakcli show disk

NAME             PATH             TYPE             STATE         STATE_DETAILS
pd_00           /dev/sdam          HDD             ONLINE          Good
pd_01           /dev/sdaw          HDD             ONLINE          Good    << This is  showing as GOOD using oakcli show DISK -- but is BAD using oakcli show diskgroup data
pd_02           /dev/sdaa          HDD             ONLINE          Good            
pd_03           /dev/sdak          HDD             ONLINE          Good
pd_04           /dev/sdan          HDD             ONLINE          Good
pd_05           /dev/sdax          HDD             ONLINE          Good
pd_06           /dev/sdab          HDD             ONLINE          Good
pd_07           /dev/sdal          HDD             ONLINE          Good
pd_08           /dev/sdao          HDD             ONLINE          Good
pd_09           /dev/sdau          HDD             ONLINE          Good
pd_10           /dev/sdac          HDD             ONLINE          Good
pd_11           /dev/sdai          HDD             ONLINE          Good
pd_12           /dev/sdap          HDD             ONLINE          Good
pd_13           /dev/sdav          HDD             ONLINE          Good
pd_14           /dev/sdad          HDD             ONLINE          Good
pd_15           /dev/sdaj          HDD             ONLINE          Good
pd_16           /dev/sdaq          HDD             ONLINE          Good
pd_17           /dev/sdas          HDD             ONLINE          Good
pd_18           /dev/sdae          HDD             ONLINE          Good
pd_19           /dev/sdag          HDD             ONLINE          Good
pd_20           /dev/sdar          SSD             ONLINE          Good
pd_21           /dev/sdat          SSD             ONLINE          Good
pd_22           /dev/sdaf          SSD             ONLINE          Good
pd_23           /dev/sdah          SSD             ONLINE          Good

However,  there is a problem ...

root@odarm1 ~]# oakcli show diskgroup data

ASM_DISK             PATH                                      DISK             STATE         STATE_DETAILS
data_00         /dev/mapper/HDD_E0_S00_975281119p1              pd_00           ONLINE          Good
data_01         /dev/mapper/HDD_E0_S01_975092811p1              pd_01           OFFLINE         Bad               << This reported as GOOD using -  oakcli show DISK
data_02         /dev/mapper/HDD_E1_S02_975112619p1              pd_02           ONLINE          Good
data_03         /dev/mapper/HDD_E1_S03_975096419p1              pd_03           ONLINE          Good
data_04         /dev/mapper/HDD_E0_S04_975101159p1              pd_04           ONLINE          Good
data_05         /dev/mapper/HDD_E0_S05_975276323p1              pd_05           ONLINE          Good
data_06         /dev/mapper/HDD_E1_S06_975286719p1              pd_06           ONLINE          Good
data_07         /dev/mapper/HDD_E1_S07_975097763p1              pd_07           ONLINE          Good
data_08         /dev/mapper/HDD_E0_S08_975059895p1              pd_08           ONLINE          Good
data_09         /dev/mapper/HDD_E0_S09_975268579p1              pd_09           ONLINE          Good
data_10         /dev/mapper/HDD_E1_S10_975057759p1              pd_10           ONLINE          Good
data_11         /dev/mapper/HDD_E1_S11_975090571p1              pd_11           ONLINE          Good
data_12         /dev/mapper/HDD_E0_S12_975082431p1              pd_12           ONLINE          Good
data_13         /dev/mapper/HDD_E0_S13_975087695p1              pd_13           ONLINE          Good
data_14         /dev/mapper/HDD_E1_S14_975098135p1              pd_14           ONLINE          Good
data_15         /dev/mapper/HDD_E1_S15_975277375p1              pd_15           ONLINE          Good
data_16         /dev/mapper/HDD_E0_S16_975053479p1              pd_16           ONLINE          Good
data_17         /dev/mapper/HDD_E0_S17_975101955p1              pd_17           ONLINE          Good
data_18         /dev/mapper/HDD_E1_S18_975105863p1              pd_18           ONLINE          Good
data_19         /dev/mapper/HDD_E1_S19_975100435p1              pd_19           ONLINE          Good




====================

[root@odarm1 ~]# ls -l /dev/mapper/HDD*

 

brw-rw---- 1 grid asmadmin 252, 26 Sep 10 19:17 /dev/mapper/HDD_E0_S00_975281119
brw-rw---- 1 grid asmadmin 252, 30 Oct 31 11:20 /dev/mapper/HDD_E0_S00_975281119p1
brw-rw---- 1 grid asmadmin 252, 33 Oct 31 10:30 /dev/mapper/HDD_E0_S00_975281119p2
brw-rw---- 1 grid asmadmin 252, 14 Oct 17 23:18 /dev/mapper/HDD_E0_S01_975092811
brw-rw---- 1 grid asmadmin 252, 27 Sep 10 19:17 /dev/mapper/HDD_E0_S04_975101159
brw-rw---- 1 grid asmadmin 252, 35 Oct 31 11:20 /dev/mapper/HDD_E0_S04_975101159p1
brw-rw---- 1 grid asmadmin 252, 37 Oct 31 10:30 /dev/mapper/HDD_E0_S04_975101159p2
...
...
brw-rw---- 1 grid asmadmin 252, 25 Sep 10 19:17 /dev/mapper/HDD_E1_S15_975277375
brw-rw---- 1 grid asmadmin 252, 34 Oct 31 11:20 /dev/mapper/HDD_E1_S15_975277375p1
brw-rw---- 1 grid asmadmin 252, 36 Oct 31 10:30 /dev/mapper/HDD_E1_S15_975277375p2
brw-rw---- 1 grid asmadmin 252, 20 Sep 10 19:17 /dev/mapper/HDD_E1_S18_975105863
brw-rw---- 1 grid asmadmin 252, 38 Oct 31 11:20 /dev/mapper/HDD_E1_S18_975105863p1
brw-rw---- 1 grid asmadmin 252, 39 Oct 31 10:30 /dev/mapper/HDD_E1_S18_975105863p2
brw-rw---- 1 grid asmadmin 252, 21 Sep 10 19:17 /dev/mapper/HDD_E1_S19_975100435
brw-rw---- 1 grid asmadmin 252, 48 Oct 31 11:20 /dev/mapper/HDD_E1_S19_975100435p1
brw-rw---- 1 grid asmadmin 252, 49 Oct 31 10:30 /dev/mapper/HDD_E1_S19_975100435p2

 

[root@odax1 mpath]# ls -altr

0 lrwxrwxrwx  1 root root     7 Dec  9 15:10 SSD_E0_S22_805852510 -> ../dm-2
0 lrwxrwxrwx  1 root root     7 Dec  9 15:10 SSD_E1_S22_805861570 -> ../dm-6
0 lrwxrwxrwx  1 root root     8 Dec  9 15:10 HDD_E1_S06_575232712 -> ../dm-18
0 lrwxrwxrwx  1 root root     8 Dec  9 15:10 HDD_E0_S04_373259068 -> ../dm-20
0 lrwxrwxrwx  1 root root     8 Dec  9 15:10 HDD_E1_S11_575233388 -> ../dm-24
...

 

 

'fwupdate list disk' shows

-  the controller
-  then the disks attached to that controller

In his case, fwupdate showed the 1st controller (no disks underneath it), then the 2nd controller(with disks underneath it). 

[root@oda1 ~]# fwupdate list disk

==================================================
CONTROLLER
==================================================
ID    Type   Manufacturer   Model     Product Name              FW Version     BIOS Version   EFI Version    FCODE Version  Package Version  NVDATA Version    XML Support
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c0    SAS    LSI Logic      0x0072    SGX-SAS6-INT-Z            11.05.02.00    07.21.04.00    07.18.02.11    01.00.60.00    -                       10.03.00.26       N/A    

DISKS
===============
ID        Manufacturer   Model                    Chassis Slot   Type   Media   Size (GB) FW Version XML Support
-----------------------------------------------------------------------------------------------------------
c0d0      HITACHI        H109060SESUN600G    -       0      -        HDD       600         A31A       N/A
c0d1      HITACHI        H109060SESUN600G    -       1      -        HDD       600         A31A       N/A

==================================================
CONTROLLER
==================================================
ID    Type   Manufacturer   Model     Product Name              FW Version     BIOS Version   EFI Version    FCODE Version  Package Version  NVDATA Version    XML Support
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c1    SAS    LSI Logic      0x0072    SGX-SAS6-EXT-Z            11.05.02.00    07.21.04.00    07.18.02.07    01.00.60.00    -                10.03.00.24       N/A    

DISKS
===============
ID        Manufacturer           Model               Chassis Slot   Type   Media Size(GB) FW Version XML Support
-----------------------------------------------------------------------------------------------------------
c1d0       HITACHI        H109090SESUN900G     0        0      sas    HDD     900       A31A       N/A
c1d1       HITACHI        H109090SESUN900G     0        1      sas    HDD     900       A31A       N/A
c1d2       HITACHI        H109090SESUN900G     0        2      sas    HDD     900       A31A       N/A
c1d3       HITACHI        H109090SESUN900G     0        3      sas    HDD     900       A31A       N/A
c1d4       HITACHI        H109090SESUN900G     0        4      sas    HDD     900       A31A       N/A
c1d5       HITACHI        H109090SESUN900G     0        5      sas    HDD     900       A31A       N/A
c1d6       HITACHI        H109090SESUN900G     0        6      sas    HDD     900       A31A       N/A
c1d7       HITACHI        H109090SESUN900G     0        7      sas    HDD     900       A31A       N/A
c1d8       HITACHI        H109090SESUN900G     0        8      sas    HDD     900       A31A       N/A
c1d9       HITACHI        H109090SESUN900G     0        9      sas    HDD     900       A31A       N/A
c1d10     HITACHI        H109090SESUN900G     0       10     sas    HDD     900       A31A       N/A
c1d11     HITACHI        H109090SESUN900G     0       11     sas    HDD     900       A31A       N/A
c1d12     HITACHI        H109090SESUN900G     0       12     sas    HDD     900       A31A       N/A
c1d13     HITACHI        H109090SESUN900G     0       13     sas    HDD     900       A31A       N/A
c1d14     HITACHI        H109090SESUN900G     0       14     sas    HDD     900       A31A       N/A
c1d15     HITACHI        H109090SESUN900G     0       15     sas    HDD     900       A31A       N/A
c1d16     HITACHI        H109090SESUN900G     0       16     sas    HDD     900       A31A       N/A
c1d17     HITACHI        H109090SESUN900G     0       17     sas    HDD     900       A31A       N/A
c1d18     HITACHI        H109090SESUN900G     0       18     sas    HDD     900       A31A       N/A
c1d19     HITACHI        H109090SESUN900G     0       19     sas    HDD     900       A31A       N/A
c1d20     STEC             Z16IZF4EUSUN200G      0       20     sas    SSD     200       9432       N/A
c1d21     STEC             Z16IZF4EUSUN200G      0       21     sas    SSD     200       9432       N/A
c1d22     STEC             Z16IZF4EUSUN200G      0       22     sas    SSD     200       9432       N/A
c1d23     STEC             Z16IZF4EUSUN200G      0       23     sas    SSD     200       9432       N/A
c1d24     HITACHI        H109090SESUN900G     1        0      sas    HDD     900       A31A       N/A
c1d25     HITACHI        H109090SESUN900G     1        1      sas    HDD     900       A31A       N/A
c1d26     HITACHI        H109090SESUN900G     1        2      sas    HDD     900       A31A       N/A
c1d27     HITACHI        H109090SESUN900G     1        3      sas    HDD     900       A31A       N/A
c1d28     HITACHI        H109090SESUN900G     1        4      sas    HDD     900       A31A       N/A
c1d29     HITACHI        H109090SESUN900G     1        5      sas    HDD     900       A31A       N/A
c1d30     HITACHI        H109090SESUN900G     1        6      sas    HDD     900       A31A       N/A
c1d31     HITACHI        H109090SESUN900G     1        7      sas    HDD     900       A31A       N/A
c1d32     HITACHI        H109090SESUN900G     1        8      sas    HDD     900       A31A       N/A
c1d33     HITACHI        H109090SESUN900G     1        9      sas    HDD     900       A31A       N/A
c1d34     HITACHI        H109090SESUN900G     1       10     sas    HDD     900       A31A       N/A
c1d35     HITACHI        H109090SESUN900G     1       11     sas    HDD     900       A31A       N/A
c1d36     HITACHI        H109090SESUN900G     1       12     sas    HDD     900       A31A       N/A
c1d37     HITACHI        H109090SESUN900G     1       13     sas    HDD     900       A31A       N/A
c1d38     HITACHI        H109090SESUN900G     1       14     sas    HDD     900       A31A       N/A
c1d39     HITACHI        H109090SESUN900G     1       15     sas    HDD     900       A31A       N/A
c1d40     HITACHI        H109090SESUN900G     1       16     sas    HDD     900       A31A       N/A
c1d41     HITACHI        H109090SESUN900G     1       17     sas    HDD     900       A31A       N/A
c1d42     HITACHI        H109090SESUN900G     1       18     sas    HDD     900       A31A       N/A
c1d43     HITACHI        H109090SESUN900G     1       19     sas    HDD     900       A31A       N/A
c1d44     STEC             Z16IZF4EUSUN200G      1       20     sas    SSD     200        9432       N/A
c1d45     STEC             Z16IZF4EUSUN200G      1       21     sas    SSD     200        9432       N/A
c1d46     STEC             Z16IZF4EUSUN200G      1       22     sas    SSD     200        9432       N/A
c1d47     STEC             Z16IZF4EUSUN200G      1       23     sas    SSD     200        9432       N/A

==================================================
CONTROLLER
==================================================
ID    Type   Manufacturer   Model     Product Name              FW Version     BIOS Version   EFI Version    FCODE Version  Package Version  NVDATA Version    XML Support
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c2    SAS    LSI Logic         0x0072    SGX-SAS6-EXT-Z          11.05.02.00    07.21.04.00    07.18.02.07    01.00.60.00    -                10.03.00.24       N/A    

DISKS
===============
ID        Manufacturer    Model                      Chassis Slot  Type   Media  Size(GB) FW Version XML Support
-----------------------------------------------------------------------------------------------------------
c2d0      HITACHI        H109090SESUN900G     0         0      sas    HDD     900       A31A       N/A
c2d1      HITACHI        H109090SESUN900G     0         1      sas    HDD     900       A31A       N/A
c2d2      HITACHI        H109090SESUN900G     0         2      sas    HDD     900       A31A       N/A
c2d3      HITACHI        H109090SESUN900G     0         3      sas    HDD     900       A31A       N/A
c2d4      HITACHI        H109090SESUN900G     0         4      sas    HDD     900       A31A       N/A
c2d5      HITACHI        H109090SESUN900G     0         5      sas    HDD     900       A31A       N/A
c2d6      HITACHI        H109090SESUN900G     0         6      sas    HDD     900       A31A       N/A
c2d7      HITACHI        H109090SESUN900G     0         7      sas    HDD     900       A31A       N/A
c2d8      HITACHI        H109090SESUN900G     0         8      sas    HDD     900       A31A       N/A
c2d9      HITACHI        H109090SESUN900G     0         9      sas    HDD     900       A31A       N/A
c2d10     HITACHI        H109090SESUN900G    0       10     sas    HDD     900       A31A       N/A
c2d11     HITACHI        H109090SESUN900G    0       11     sas    HDD     900       A31A       N/A
c2d12     HITACHI        H109090SESUN900G    0       12     sas    HDD     900       A31A       N/A
c2d13     HITACHI        H109090SESUN900G    0       13     sas    HDD     900       A31A       N/A
c2d14     HITACHI        H109090SESUN900G    0       14     sas    HDD     900       A31A       N/A
c2d15     HITACHI        H109090SESUN900G    0       15     sas    HDD     900       A31A       N/A
c2d16     HITACHI        H109090SESUN900G    0       16     sas    HDD     900       A31A       N/A
c2d17     HITACHI        H109090SESUN900G    0       17     sas    HDD     900       A31A       N/A
c2d18     HITACHI        H109090SESUN900G    0       18     sas    HDD     900       A31A       N/A
c2d19     HITACHI        H109090SESUN900G    0       19     sas    HDD     900       A31A       N/A
c2d20     STEC              Z16IZF4EUSUN200G    0       20     sas    SSD     200       9432       N/A
c2d21     STEC              Z16IZF4EUSUN200G    0       21     sas    SSD     200       9432       N/A
c2d22     STEC              Z16IZF4EUSUN200G    0       22     sas    SSD     200       9432       N/A
c2d23     STEC              Z16IZF4EUSUN200G    0       23     sas    SSD     200       9432       N/A
c2d24     HITACHI        H109090SESUN900G    1        0      sas    HDD     900       A31A       N/A
c2d25     HITACHI        H109090SESUN900G    1        1      sas    HDD     900       A31A       N/A
c2d26     HITACHI        H109090SESUN900G    1        2      sas    HDD     900       A31A       N/A
c2d27     HITACHI        H109090SESUN900G    1        3      sas    HDD     900       A31A       N/A
c2d28     HITACHI        H109090SESUN900G    1        4      sas    HDD     900       A31A       N/A
c2d29     HITACHI        H109090SESUN900G    1        5      sas    HDD     900       A31A       N/A
c2d30     HITACHI        H109090SESUN900G    1        6      sas    HDD     900       A31A       N/A
c2d31     HITACHI        H109090SESUN900G    1        7      sas    HDD     900       A31A       N/A
c2d32     HITACHI        H109090SESUN900G    1        8      sas    HDD     900       A31A       N/A
c2d33     HITACHI        H109090SESUN900G    1        9      sas    HDD     900       A31A       N/A
c2d34     HITACHI        H109090SESUN900G    1       10     sas    HDD     900       A31A       N/A
c2d35     HITACHI        H109090SESUN900G    1       11     sas    HDD     900       A31A       N/A
c2d36     HITACHI        H109090SESUN900G    1       12     sas    HDD     900       A31A       N/A
c2d37     HITACHI        H109090SESUN900G    1       13     sas    HDD     900       A31A       N/A
c2d38     HITACHI        H109090SESUN900G    1       14     sas    HDD     900       A31A       N/A
c2d39     HITACHI        H109090SESUN900G    1       15     sas    HDD     900       A31A       N/A
c2d40     HITACHI        H109090SESUN900G    1       16     sas    HDD     900       A31A       N/A
c2d41     HITACHI        H109090SESUN900G    1       17     sas    HDD     900       A31A       N/A
c2d42     HITACHI        H109090SESUN900G    1       18     sas    HDD     900       A31A       N/A
c2d43     HITACHI        H109090SESUN900G    1       19     sas    HDD     900       A31A       N/A
c2d44     STEC              Z16IZF4EUSUN200G    1       20     sas    SSD     200       9432       N/A
c2d45     STEC              Z16IZF4EUSUN200G    1       21     sas    SSD     200       9432       N/A
c2d46     STEC              Z16IZF4EUSUN200G    1       22     sas    SSD     200       9432       N/A
c2d47     STEC              Z16IZF4EUSUN200G    1       23     sas    SSD     200       9432       N/A

 

 

# cd /dev/mapper

[root@oda mapper]#  ls -al                                                             By NAME   -- helpful as this allows you to check the disks in order 0-23
...
crw-------  1 root root      10, 236 Dec  9 15:09 control
brw-rw----  1 grid asmadmin 252,  12 Dec  9 15:10  HDD_E0_S00_372932264
brw-rw----  1 grid asmadmin 252,  64 Dec 13 01:30 HDD_E0_S00_372932264p1
brw-rw----  1 grid asmadmin 252,  65 Dec 10 06:10 HDD_E0_S00_372932264p2           << Notice each HDD disk should list as [],p1 and p2
brw-rw----  1 grid asmadmin 252,  13 Dec  9 15:10  HDD_E0_S01_373745920
brw-rw----  1 grid asmadmin 252,  82 Dec 13 01:35 HDD_E0_S01_373745920p1
brw-rw----  1 grid asmadmin 252,  83 Dec 10 06:10 HDD_E0_S01_373745920p2
...
...
brw-rw----  1 grid asmadmin 252,    6 Dec  9 15:10 SSD_E1_S22_805861570
brw-rw----  1 grid asmadmin 252,  96 Dec  9 15:10 SSD_E1_S22_805861570p1          << Notice each SSD disk should list as [],p1 only
brw-rw----  1 grid asmadmin 252,    7 Dec  9 15:10 SSD_E1_S23_805820183
brw-rw----  1 grid asmadmin 252, 113 Dec  9 15:10 SSD_E1_S23_805820183p1l
[root@oda mapper]#  ls -altr                                                           By TIME     -- helpful as this allows you to check the last disks being added during replacement / problem troubleshooting
...
crw-------  1 root root      10, 236 Dec  9 15:09 control
brw-rw----  1 grid asmadmin 252,   7 Dec  2 15:10 SSD_E1_S23_805820183
brw-rw----  1 grid asmadmin 252,   6 Dec  2 15:10 SSD_E1_S22_805861570
brw-rw----  1 grid asmadmin 252,   5 Dec  2 15:10 SSD_E1_S21_805861578
...
...
brw-rw----  1 grid asmadmin 252,  67 Dec 13 01:49 SSD_E0_S20_805852554p1
brw-rw----  1 grid asmadmin 252,  48 Dec 13 01:49 SSD_E0_S22_805852510p1
brw-rw----  1 grid asmadmin 252,  98 Dec 13 01:49 HDD_E0_S17_372466360p1

 

CHECK THE SHELF / JBOD  ID, FW and if Primary or Secondary

-  may be useful for problems with Second JBOD installation issues

 

[root@oda1~]# fwupdate list expander

==================================================
CONTROLLER
==================================================
ID    Type   Manufacturer   Model     Product Name              FW Version     BIOS Version   EFI Version     FCODE Version  Package Version  NVDATA Version    XML Support
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c0    SAS    LSI Logic      0x0072    SGX-SAS6-EXT-Z            11.05.03.00    07.21.09.00    07.22.05.00    01.00.62.00    -                        10.03.00.32       N/A

EXPANDERS
===============
ID        Chassis Slot Manufacturer   Model              Expander Name       FW Version     XML Support
------------------------------------------------------------------------------------------------------
c0x0      0       -    ORACLE         DE2-24P                     Primary               0018            N/A
c0x1      1       -    ORACLE         DE2-24P                    Primary              0018            N/A

==================================================
CONTROLLER
==================================================
ID    Type   Manufacturer   Model     Product Name              FW Version     BIOS Version   EFI Version    FCODE Version  Package Version  NVDATA Version    XML Support
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
c1    SAS    LSI Logic      0x0072    SGX-SAS6-EXT-Z            11.05.03.00    07.21.09.00    07.22.05.00    01.00.62.00    -                         10.03.00.32       N/A

EXPANDERS
===============
ID        Chassis Slot Manufacturer   Model             Expander Name       FW Version     XML Support
------------------------------------------------------------------------------------------------------
c1x0      0       -    ORACLE           DE2-24P               Secondary            0018           N/A
c1x1      1       -    ORACLE           DE2-24P               Secondary            0018           N/A

[root@odax3rm1 ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Thu Sep 25 10:29:00 2014
     Raid Level : raid1
     Array Size : 513984 (502.02 MiB 526.32 MB)
  Used Dev Size : 513984 (502.02 MiB 526.32 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent   <<

    Update Time : Sun Dec  7 04:22:07 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 85566c22:9e98c359:d2aeac35:2319ee66
         Events : 0.53

    Number   Major   Minor   RaidDevice State
       0      70        1        0      active sync   /dev/sdcs1   <<< Good
       1      70       17        1      active sync   /dev/sdct1


[root@odax3rm1 ~]# mdadm --detail /dev/md1
/dev/md1:
        Version : 0.90
  Creation Time : Thu Sep 25 10:26:23 2014
     Raid Level : raid1
     Array Size : 20482752 (19.53 GiB 20.97 GB)
  Used Dev Size : 20482752 (19.53 GiB 20.97 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Dec 12 13:13:42 2014
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 38a9a77d:a20c944a:fe637981:8d9f2471
         Events : 0.21

    Number   Major   Minor   RaidDevice State
       0      70        3        0      active sync   /dev/sdcs3
       1      70       19        1      active sync   /dev/sdct3


 

oakcli validate -c OSDiskStorage

root@odax3rm1-net1 ~]# oakcli validate -c OSDiskStorage

INFO: Checking Operating System Storage
SUCCESS: The OS disks have the boot stamp
RESULT: Logical Volume   No volume groups found in  Volume group is of size
RESULT: Device /dev/xvda2 is mounted on / of type ext3 in (rw)
RESULT: Device /dev/xvda1 is mounted on /boot of type ext3 in (rw)
RESULT: Device /dev/xvdb1 is mounted on /u01 of type ext3 in (rw)
RESULT: / has 31100 MB free out of total 55852 MB
RESULT: /boot has 393 MB free out of total 460 MB
RESULT: /u01 has 50489 MB free out of total 93868 MB

  

  oakcli validate -c StorageTopology


[root@odax3rm1-net1 ~]# oakcli validate -c StorageTopology

 It may take a while. Please wait...

 INFO    : ODA Topology Verification
 INFO    : Running on Node0
 INFO    : Check hardware type
 SUCCESS : Type of hardware found : X3-2
 INFO    : Check for Environment(Bare Metal or Virtual Machine)
 SUCCESS : Type of environment found : Virtual Machine(ODA BASE)
 SUCCESS : Number of External LSI SAS controller found : 2
 INFO    : Check for Controllers correct PCIe slot address
 SUCCESS : External LSI SAS controller 0 : 00:15.0
 SUCCESS : External LSI SAS controller 1 : 00:16.0
 INFO    : Check if JBOD powered on
 SUCCESS : 2JBOD : Powered-on
 INFO    : Check for correct number of EBODS(2 or 4)
 SUCCESS : EBOD found : 4
 INFO    : Check for External Controller 0
 SUCCESS : Cable check for port 0 on controller 0
 SUCCESS : Cable check for port 1 on controller 0
 SUCCESS : Overall Cable check for controller 0
 INFO    : Check for External Controller 1
 SUCCESS : Cable check for port 0 on controller 1
 SUCCESS : Cable check for port 1 on controller 1
 SUCCESS : Overall Cable check for controller 1
 INFO    : Check for overall status of cable validation on Node0
 SUCCESS : Overall Cable Validation on Node0
 SUCCESS : JBOD0 Nickname set correctly : Oracle Database Appliance - E0
 SUCCESS : JBOD1 Nickname set correctly : Oracle Database Appliance - E1

 

 

[root@odax3rm1-net1 ~]# oakcli validate -c SharedStorage

INFO: Checking Shared Storage

RESULT: Disk HDD_E0_S00_372932264 path1 status active device sdv with status active path2 status active device sdat with status active
SUCCESS: HDD_E0_S00_372932264 has both the paths up and active
RESULT: Disk HDD_E0_S01_373745920 path1 status active device sdw with status active path2 status active device sdau with status active
SUCCESS: HDD_E0_S01_373745920 has both the paths up and active
RESULT: Disk HDD_E0_S02_373293708 path1 status active device sdx with status active path2 status active device sdav with status active
SUCCESS: HDD_E0_S02_373293708 has both the paths up and active
RESULT: Disk HDD_E0_S03_373765744 path1 status active device sdz with status active path2 status active device sdaw with status active
SUCCESS: HDD_E0_S03_373765744 has both the paths up and active
RESULT: Disk HDD_E0_S04_373259068 path1 status active device sdaa with status active path2 status active device sdax with status active
SUCCESS: HDD_E0_S04_373259068 has both the paths up and active
RESULT: Disk HDD_E0_S05_373543228 path1 status active device sdad with status active path2 status active device sday with status active
SUCCESS: HDD_E0_S05_373543228 has both the paths up and active

 


  INTERESTING examples of what commands are useful vs. others that are not


No useful information using VALIDATATION

[root@odax3rm1-net1 ~]# oakcli show validation storage failures
Show soft validation failures                            -- Nothing reported : Just a confirmation that the command was executed

[root@odax3rm1-net1 ~]#

Same system, different command

[root@odax3rm1-net1 ~]# oakcli show storage -errors

ERROR: Disk e1_pd_23 [/dev/sdh] 35000a7203007d717@1327FM4013 belongs to another host's chassis#: 1252FM400F]

Same system, shows SSD 23 for both JBODs

 

[root@odax3rm1-net1 ~]# grep S23 /opt/oracle/extapi/asmappl.config
disk  /dev/mapper/SSD_E0_S23_805852551p1        0                     23               1
disk  /dev/mapper/SSD_E1_S23_805820183p1        1                     23               1   < no evidence of a problem

  Yet, oakcli shows the same disk as removed

 [root@odax3rm1-net1 ~]# oakcli show diskgroup redo
        ASM_DISK        PATH                                            DISK            STATE           STATE_DETAILS

        e0_redo_20      /dev/mapper/SSD_E0_S20_805852554p1              e0_pd_20        ONLINE          Good
        e0_redo_21      /dev/mapper/SSD_E0_S21_805852541p1              e0_pd_21        ONLINE          Good
        e0_redo_22      /dev/mapper/SSD_E0_S22_805852510p1              e0_pd_22        ONLINE          Good
        e0_redo_23      /dev/mapper/SSD_E0_S23_805852551p1              e0_pd_23        ONLINE          Good
        e1_redo_20      /dev/mapper/SSD_E1_S20_805861591p1              e1_pd_20        ONLINE          Good
        e1_redo_21      /dev/mapper/SSD_E1_S21_805861578p1              e1_pd_21        ONLINE          Good
        e1_redo_22      /dev/mapper/SSD_E1_S22_805861570p1              e1_pd_22        ONLINE          Good
        e1_redo_23      /dev/mapper/SSD_E1_S23_805820183p1              e1_pd_23        FAILED          DiskRemoved

 

 

References

<NOTE:1519879.1> - ODA (Oracle Database Appliance) and ASM 2.1 up to 2.10 Storage Options for V1, X3-2 and X4-2 Hardware
<NOTE:1435946.1> - How to Replace an ODA (Oracle Database Appliance) FAILED/ PredictiveFail Shared Storage Disk
<NOTE:1496114.1> - ODA (Oracle Database Appliance): The Steps to replace multiple disks failing concurrently
<NOTE:550569.1> - R12: Vertex Data - How to integrate
<NOTE:1550569.1> - How to Troubleshoot OS disk issues on the Oracle Database Appliance
<NOTE:1401471.1> - ODA After replacing a disk on Oracle Database Appliance the new disk is not added to ASM 2.1 to 2.4
<NOTE:1382300.1> - ODA (Oracle Database Appliance) : How to replace FAILED SYSTEM BOOT DISK
<NOTE:1420126.1> - ODA (Oracle Database Appliance) Different Disks Randomly Disappear After a Reboot
<NOTE:1990134.1> - Replace new disk failed at ODA (Oracle Database Appliance) 12 version
<NOTE:1497610.1> - Determining when Disks should be replaced on Oracle Database Appliance
<NOTE:1390058.1> - Oracle Database Appliance Diagnostic Information required for Disk Failures
<NOTE:1457254.1> - ODA (Oracle Database Appliance): after disk failure some disks are in ASM mount_status 'CLOSED'
<NOTE:1536486.1> - Replaced ODA drive lists as “UNKNOWN PARTIAL PathsNotLoaded”

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback