![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1596826.1 : ODA Disk Issue: Suggested Common Disk Diagnostic Commands or Scripts with Tips, Best Practice and Examples
As of ODA / Oak 2.10, the _oakcli stordiag ..._ command is the single-source of truth for various disk conditions, disk diagnostics and trouble-shooting: For older versions (pre-2.10) there are many different approaches to disk based problems including multiple diagnostics and commands. However, not all commands are as they seem in providing the information required to troubleshoot ODA + Disk based problems. This is a living document intended to catalog some of the more commonly used commands, provide examples and a synposis of usefulness depending on the situation. The current usage of the note is more for casual review vs. rigorous trouble-shooting as many new commands are found and cataloged - therefore internal only at this time In this Document
Applies to:Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]Oracle Database Appliance Software - Version 2.1.0.1 to 12.1.2.7 [Release 2.1 to 12.1] Oracle Database Appliance X4-2 - Version All Versions to All Versions [Release All Releases] Oracle Database Appliance X3-2 - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. PurposeTo assist with commands that can be used for debugging ODA disk issues Please use the following note for required Disk Diagnostic information Note 1390058.1 - Oracle Database Appliance Diagnostic Information required for Disk Failures
MINIMUM DISK REPLACEMENT DIAGNOSTICS
If Oakd is running on both nodes:
If more than one disk
oakcli stordiag outputs oakcli stordiag e0_pd_02 and if the disk is the SECOND JBOD use e1_pd_02
oakcli stordiag pd_02
If you cannot run OAK
TIP: Before adding a disk back into the ODA, wait until you can confirm the old disk has been removed.
1) confirm the HW version and deployment type ( BM or ODAVP) 2) Stordiag runs several check on EACH NODE for the disk - this is considered the best 'single source of truth' as of 2.10 and higher outside HW checks
Example:
3) ASM is critical for the OVERALL ODA STABILITY and HEALTH - One disk may be looked at, but if replacing that disk exceeds the redundancy of the ASM Diskgroups a much more serious problem can occur bringing down the database or worse. IMPORTANT: Inspect BOTH NODEs ASM alert.logs to confirm 4) Checking on BOTH nodes may quickly confirm if the problem is for more than a single disk and if isolated to a single node. 5) DISKDIAG is used by both EEST and HW and should always be one of the first pieces of information collected
Troubleshooting Steps
On ODA, with the latest releases, we depend on OAK as the single source of truth for any shared disk failures.
It is possible in the above case that there is a physical failure but some older versions of OAK could not detect this:
Disk Commands for Trouble Shootingodasundiag./odasundiag.sh
oakcli
oakcli commands are issued as root and can be executed from /opt/oracle/oak/bin/ # oakcli manage diagcollect --storage --
# oakcli stordiag e#_pd_XX -- use pd_XX for ODA V1 ..e#_pd_XX where e# is e0 or e1 and "XX" is the slot ## for enclosure 0 or 1
# oakcli show env_hw -- Confirms HW version plus if on ODAVP where you are issuing the command from
# oakcli validate -c StorageTopology
# oakcli validate -c SharedStorage
# oakcli show validation storage
# oakcli validate -c OSDiskStorage
# oakcli validate -v -c OSDiskStorage -- for Boot / System disks
# oakcli show validation storage -errors -- 2.8+ Shows hard failure errors
# oakcli show storage -errors -- 2.7 Shows hard failure errors
# oakcli show diskgroup -- Should List the three groups
# oakcli show diskgroup [ DATA | RECO | REDO ] -- Select one of the three groups to get individual disk group disk details
# oakcli show validation storage failures grep < disk resource name> (Individual resource failure)
# oakcli show validation storage failures -- Shows ALL soft errors
# oakcli addasmdisk .... fw and disk
# fwupdate list disk -- Lists SYSTEM disks + Disk ID, Slot, Size, FW version, plus controller information
-- Not supported on V1 plus all disks: Name, Path, Type, state, state_details; ID, chassis,slot,type # smartctl -a /dev/s...
ASM
SQLPLUS> gv$asm_disk | gv$asm_diskgroup | gv$_asm_operation
# view /opt/oracle/extapi/asmappl.config -- both nodes
asmcmd> lsdg
asmcmd> volinfo --all
asmcmd> lsof
asmcmd> volenable --all
kfed read /dev/mapper/..D_E......[ p1 or p2 ] -- optional | head -53
mapper ls -l /dev/mapper/*D*
More from - /dev/mapper # /dev/mapper ls -al - By Name
# /dev/mapper ls -altr - By Time # ls -altr *S2* -- To check SSDs only
# ls -al /dev/mapper/*D*
mpath
# ls -al /dev/mpath
OS Disksoakcli
# /opt/oracle/oak/bin/oakcli validate -v -c OSDiskStorage
mdstat
# cat /proc/mdstat
mdadm
mdadm --detail /dev/md0
mdadm --detail /dev/md1 fwupdate# fwupdate list all -- Shows versions for disks, FW,
# fwupdate list disk -- Lists SYSTEM disks + Disk ID, Slot, Size, FW version, plus controller information
plus all disks: Name, Path, Type, state, state_details; ID, chassis,slot,type lsscsi[root@oda1 ~]# lsscsi -v | grep 600G ----- For the X3-2 this is issued on BM or DOM0 if using ODAVP e.g. Disk Diag request bundled commands
Use the following for a data collection when you are diagnosing general Storage and disk issues Please send the following files: /opt/oracle/oak/onecmd/tmp/*
/etc/multipath.conf, /opt/oracle/extapi/asmappl.config, /opt/oracle/oak/log/<HOSTNAME>/oak/oakd* <-- make sure you place your hostname in the path, if this path was created /opt/oracle/oak/log/test/oak/oakd* /opt/oracle/oak/conf/validation_props.xml file /var/log/messages* Output from the following commands: multipath -l
ls -l /dev/mapper/* fwupdate list all oakcli validate -c storagetopology oakcli show version -detail JBOD / Storage Shelf
# fwupdate list expander -- Confirms Shelf/JBOD plus -- #ID, Chassis, Slot, Expander Name, FW Version, Manufacturer, Model(e.g.DE2-24P)
# oakcli validate -c SharedStorage
Disk Layouts Note that the ODA X4-2 and X3-2 both use the same DE2-24P Storage Shelf ALSO useful
Note: The information here in this Note provides some context and examples for various ODA commands:
oakcli stordiag <resource_name> Usage: oakcli stordiag -h | n
Example : [root@odarm1 ~]# oakcli stordiag pd_01 -- newer HW version x5-2 X4-2 or X3-2 versions will use oakcli stordiag e0_pd_## for the first JBOD and e1_pd_## where e0 is the first JBOD enclosure and e1 is the second JBOD enclosure and ## is the disk number 00 up to 23 Node Name : odarm1 Test : Diagnostic Test Description 1 : OAK Check NAME PATH TYPE STATE STATE_DETAILS pd_01 /dev/sdaw HDD ONLINE Good 2 : ASM Check ASM Disk Status : state mode_s mount_s header_s 3 : Smartctl Health Check SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5 [asc=5d, ascq=5] 4 : Multipathd Status multipathd running on system 5 : Multipath Status Device List : /dev/sdm /dev/sdaw Info: HDD_E0_S01_975092811 (35000c5003a1ebc4b) dm-14 SEAGATE,ST360057SSUN600G size=559G features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 6:0:10:0 sdm 8:192 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 7:0:23:0 sdaw 67:0 active ready running IO Test Result: /dev/sdm : PASS /dev/sdaw : PASS 6 : Check Partition using fdisk Check using active device path: /dev/sdm Partition check on device /dev/sdm : FAIL Partition list found by fdisk for active device path: /dev/sdm Device Boot Start End Blocks Id System Check using passive device path: /dev/sdaw Partition check on device /dev/sdaw : FAIL Partition list found by fdisk for passive device path: /dev/sdaw Device Boot Start End Blocks Id System 7 : Device Mapper Diagnostics Mapper Device : dm-14 IO Test Result: /dev/dm-14 : PASS [INFO]: No partition seen in /dev/mapper directory 8 : fwupdate ID Manufacturer Model Chassis Slot Type Media Size (GB) FW Version XML Support c1d1 SEAGATE ST360057SSUN600G 0 1 sas HDD 600 0B25 N/A c2d1 SEAGATE ST360057SSUN600G 0 1 sas HDD 600 0B25 N/A 9 : Fishwrap Controller "mpt2sas:0d:00.0" Disk /dev/sdm: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG 6SL0SQHG"), bay 1 Controller "mpt2sas:1f:00.0" Disk /dev/sdaw: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG 6SL0SQHG"), bay 1 10 : SCSI INQUIRY Active multipath device /dev/sdm : PASS Passive multipath device /dev/sdaw : PASS 11 : Multipath Conf for device multipath { wwid 35000c5003a1ebc4b alias HDD_E0_S01_975092811 } 12 : Last five LSI Events Received for slot 1 oakd.l02-2013-08-23 22:28:10.255: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C0 oakd.l02: desc: Predictive failure: PD 0a(e2/s1) oakd.l02-2013-08-23 22:28:10.315: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C0 oakd.l02: desc: Predictive failure: PD 0a(e2/s1) oakd.l02-2013-08-23 22:50:13.752: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C1 oakd.l02: desc: Predictive failure: PD 17(e3/s1) oakd.l02-2013-08-23 22:50:13.753: [ ADAPTER][1217067328] H Received new event from LSI: Ctrl id: C1 oakd.l02: desc: Predictive failure: PD 17(e3/s1) 13 : Version Information OAK : 2.7.0.0.0 kernel : 2.6.39-400.111.1.el5uek mpt2sas : 16.05.01.00 Multipath : 0.4.9 Disk Firmware : 0B25 14 : OAK Conf Parms Device : queue_depth Timeout max_sectors_kb nr_requests read_ahead_kb scheduler /dev/sdm : 32 32 1024 4096 128 noop [deadline] cfq /dev/sdaw : 32 32 1024 4096 128 noop [deadline] cfq ****************************** ********** 2nd NODE ********** ****************************** The authenticity of host '192.168.16.25 (192.168.16.25)' can't be established. RSA key fingerprint is dd:5b:37:cc:85:6b:b1:c4:8e:80:66:27:7f:b1:37:23. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.16.25' (RSA) to the list of known hosts. Node Name : odarm2 Test : Diagnostic Test Description 1 : OAK Check NAME PATH TYPE STATE STATE_DETAILS pd_01 /dev/sdaw HDD ONLINE Good 2 : ASM Check ASM Disk Status : state mode_s mount_s header_s 3 : Smartctl Health Check SMART Health Status: FAILURE PREDICTION THRESHOLD EXCEEDED: ascq=0x5 [asc=5d, ascq=5] 4 : Multipathd Status multipathd running on system 5 : Multipath Status Device List : /dev/sdm /dev/sdaw Info: HDD_E0_S01_975092811 (35000c5003a1ebc4b) dm-14 SEAGATE,ST360057SSUN600G size=559G features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 6:0:10:0 sdm 8:192 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 7:0:23:0 sdaw 67:0 active ready running IO Test Result: /dev/sdm : PASS /dev/sdaw : PASS 6 : Check Partition using fdisk Check using active device path: /dev/sdm Partition check on device /dev/sdm : FAIL Partition list found by fdisk for active device path: /dev/sdm Device Boot Start End Blocks Id System Check using passive device path: /dev/sdaw Partition check on device /dev/sdaw : FAIL Partition list found by fdisk for passive device path: /dev/sdaw Device Boot Start End Blocks Id System 7 : Device Mapper Diagnostics Mapper Device : dm-14 IO Test Result: /dev/dm-14 : PASS [INFO]: No partition seen in /dev/mapper directory 8 : fwupdate ID Manufacturer Model Chassis Slot Type Media Size (GB) FW Version XML Support c1d1 SEAGATE ST360057SSUN600G 0 1 sas HDD 600 0B25 N/A c2d1 SEAGATE ST360057SSUN600G 0 1 sas HDD 600 0B25 N/A 9 : Fishwrap Controller "mpt2sas:0d:00.0" Disk /dev/sdm: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG 6SL0SQHG"), bay 1 Controller "mpt2sas:1f:00.0" Disk /dev/sdaw: SEAGATE ST360057SSUN600G (s/n "001116E0SQHG 6SL0SQHG"), bay 1 10 : SCSI INQUIRY Active multipath device /dev/sdm : PASS Passive multipath device /dev/sdaw : PASS 11 : Multipath Conf for device multipath { wwid 35000c5003a1ebc4b alias HDD_E0_S01_975092811 } 12 : Last five LSI Events Received for slot 1 oakd.l02-2013-08-23 22:28:10.240: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C0 oakd.l02: desc: Predictive failure: PD 0a(e2/s1) oakd.l02-2013-08-23 22:28:10.247: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C0 oakd.l02: desc: Predictive failure: PD 0a(e2/s1) oakd.l02-2013-08-23 22:28:37.846: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C1 oakd.l02: desc: Predictive failure: PD 17(e3/s1) oakd.l02-2013-08-23 22:28:37.847: [ ADAPTER][972740928] H Received new event from LSI: Ctrl id: C1 oakd.l02: desc: Predictive failure: PD 17(e3/s1) 13 : Version Information OAK : 2.7.0.0.0 kernel : 2.6.39-400.111.1.el5uek mpt2sas : 16.05.01.00 Multipath : 0.4.9 Disk Firmware : 0B25 14 : OAK Conf Parms Device : queue_depth Timeout max_sectors_kb nr_requests read_ahead_kb scheduler /dev/sdm : 32 32 1024 4096 128 noop [deadline] cfq /dev/sdaw : 32 32 1024 4096 128 noop [deadline] cfq Above details can also be found in log file=/opt/oracle/oak/log/odarm1/stordiag/stordiag-2013-10-31-11:01:13.log [root@odarm1 ~]#
For ODA V1: oakcli show disk pd_## < where # is the Physical Disk (pd) number -- Also good for a quick diagnostic at the single disk level Comment Oracle Application Kit Command Line Interface (oakcli) is exclusively used ODA configuring, installing, patching and administration Oakcli commands are the preferred methods for most all ODA maintenance and administration including the creation of databases, and mandatory for the import of VM templates and patching or upgrades
oakcli show disk pd_xx - where pd_xx is the name of the resource
odasundiag.sh odasundiag.sh Example Output from script:
Useful for debugging aspects of the problem -- usually after initial problem is at least partially understood ls -l /dev/mapper/*D* -- Generic: can be used to list all disks
ls -l /dev/mapper/HDD* -- HDD only
ls -l /dev/mapper/SSD* -- SSD only
Resource: pd_01
oakcli show disk Not Recommended : Potentially misleading
oakcli show disk -- Quick confirmation of physical disks known by oakcli - if the disk is physically removed you would might confirm a missing reference and a gap in the Physical Disk identifiers. However, if you did not know that the range is pd_00 up to pd_23 you might miss one or more disks at the beginning or end of the range Worse case is you might use this command and believe that this confirms not problems with any of the disks being used: This is not a good query to determine health of the disk. This command appears to identify if a disk is in the slot and recognized as physically existing
ls -l /dev/mapper/HDD* - Lists HHD disks but gives no explicit warning of a missing disk
Use
No evidence of a problem source Output is not consecutive so not easy to spot what disk if any is missing Does not list SSDs Can be used to provide details on the current detected mapped disks including permissions ; Group | User owner ; -- Should be grid | asmadmin Date disk first created(?) | Date mapped post ASM (?) ; device name , path(s) and node# Example: ls -l /dev/mapper/*D*
Good [root@odax3rm1 ~]# ls -l /dev/mapper/*D*
brw-rw---- 1 grid asmadmin 252, 23 Oct 24 11:59 /dev/mapper/HDD_E0_S00_372932264 brw-rw---- 1 grid asmadmin 252, 99 Nov 1 02:03 /dev/mapper/HDD_E0_S00_372932264p1 brw-rw---- 1 grid asmadmin 252, 100 Nov 1 02:03 /dev/mapper/HDD_E0_S00_372932264p2 brw-rw---- 1 grid asmadmin 252, 7 Oct 24 12:00 /dev/mapper/HDD_E0_S01_373745920 brw-rw---- 1 grid asmadmin 252, 74 Nov 1 02:03 /dev/mapper/HDD_E0_S01_373745920p1 brw-rw---- 1 grid asmadmin 252, 101 Nov 1 02:03 /dev/mapper/HDD_E0_S01_373745920p2 ... Bad [root@odarm1 ~]# ls -l /dev/mapper/**D*
brw-rw---- 1 grid asmadmin 252, 26 Sep 10 19:17 /dev/mapper/HDD_E0_S00_975281119 brw-rw---- 1 grid asmadmin 252, 30 Oct 31 11:49 /dev/mapper/HDD_E0_S00_975281119p1 brw-rw---- 1 grid asmadmin 252, 33 Oct 31 11:21 /dev/mapper/HDD_E0_S00_975281119p2 brw-rw---- 1 grid asmadmin 252, 14 Oct 17 23:18 /dev/mapper/HDD_E0_S01_975092811 << only one of three reference to slot S01: We are missing p1 p2 brw-rw---- 1 grid asmadmin 252, 27 Sep 10 19:17 /dev/mapper/HDD_E0_S04_975101159 << Notice that this is not sequential : We go from S00; S01 (partial); skip S02 and S03, and then reference S04 brw-rw---- 1 grid asmadmin 252, 35 Oct 31 11:49 /dev/mapper/HDD_E0_S04_975101159p1 brw-rw---- 1 grid asmadmin 252, 37 Oct 31 11:21 /dev/mapper/HDD_E0_S04_975101159p2 ...
...
... brw-rw---- 1 grid asmadmin 252, 4 Oct 17 23:18 /dev/mapper/SSD_E0_S20_805650933 -- mising p1 brw-rw---- 1 grid asmadmin 252, 12 Oct 17 23:18 /dev/mapper/SSD_E0_S21_805650925 brw-rw---- 1 grid asmadmin 252, 51 Oct 17 23:18 /dev/mapper/SSD_E0_S21_805650925p1 brw-rw---- 1 grid asmadmin 252, 10 Sep 10 19:17 /dev/mapper/SSD_E1_S22_805650984 brw-rw---- 1 grid asmadmin 252, 56 Oct 31 11:49 /dev/mapper/SSD_E1_S22_805650984p1 brw-rw---- 1 grid asmadmin 252, 5 Oct 17 23:18 /dev/mapper/SSD_E1_S23_805622321 brw-rw---- 1 grid asmadmin 252, 50 Oct 17 23:18 /dev/mapper/SSD_E1_S23_805622321p1
Doing a Count of the DISKS
[root@odax ~]# ls -l /dev/mapper/HDD* |wc -l
120
However, the above does not provide any information pointing to the problem with the MISSING / SSD in Slot 23 ( the last slot) or which JBOD the disk is in [root@odax3rm1-net1 ~]# oakcli show diskgroup redo
Examples for V1: [root@oda1 ~]# ls -l /dev/mapper/*D* |wc -l
64
[root@oda1 ~]# ls -l /dev/mapper/HDD* |wc -l
<< we are missing disks: but not details are provided which disks or slots or state
[root@odarm1 ~]# oakcli show disk
NAME PATH TYPE STATE STATE_DETAILS pd_00 /dev/sdam HDD ONLINE Good pd_01 /dev/sdaw HDD ONLINE Good << This is showing as GOOD using oakcli show DISK -- but is BAD using oakcli show diskgroup data pd_02 /dev/sdaa HDD ONLINE Good pd_03 /dev/sdak HDD ONLINE Good pd_04 /dev/sdan HDD ONLINE Good pd_05 /dev/sdax HDD ONLINE Good pd_06 /dev/sdab HDD ONLINE Good pd_07 /dev/sdal HDD ONLINE Good pd_08 /dev/sdao HDD ONLINE Good pd_09 /dev/sdau HDD ONLINE Good pd_10 /dev/sdac HDD ONLINE Good pd_11 /dev/sdai HDD ONLINE Good pd_12 /dev/sdap HDD ONLINE Good pd_13 /dev/sdav HDD ONLINE Good pd_14 /dev/sdad HDD ONLINE Good pd_15 /dev/sdaj HDD ONLINE Good pd_16 /dev/sdaq HDD ONLINE Good pd_17 /dev/sdas HDD ONLINE Good pd_18 /dev/sdae HDD ONLINE Good pd_19 /dev/sdag HDD ONLINE Good pd_20 /dev/sdar SSD ONLINE Good pd_21 /dev/sdat SSD ONLINE Good pd_22 /dev/sdaf SSD ONLINE Good pd_23 /dev/sdah SSD ONLINE Good However, there is a problem ... root@odarm1 ~]# oakcli show diskgroup data
ASM_DISK PATH DISK STATE STATE_DETAILS data_00 /dev/mapper/HDD_E0_S00_975281119p1 pd_00 ONLINE Good data_01 /dev/mapper/HDD_E0_S01_975092811p1 pd_01 OFFLINE Bad << This reported as GOOD using - oakcli show DISK data_02 /dev/mapper/HDD_E1_S02_975112619p1 pd_02 ONLINE Good data_03 /dev/mapper/HDD_E1_S03_975096419p1 pd_03 ONLINE Good data_04 /dev/mapper/HDD_E0_S04_975101159p1 pd_04 ONLINE Good data_05 /dev/mapper/HDD_E0_S05_975276323p1 pd_05 ONLINE Good data_06 /dev/mapper/HDD_E1_S06_975286719p1 pd_06 ONLINE Good data_07 /dev/mapper/HDD_E1_S07_975097763p1 pd_07 ONLINE Good data_08 /dev/mapper/HDD_E0_S08_975059895p1 pd_08 ONLINE Good data_09 /dev/mapper/HDD_E0_S09_975268579p1 pd_09 ONLINE Good data_10 /dev/mapper/HDD_E1_S10_975057759p1 pd_10 ONLINE Good data_11 /dev/mapper/HDD_E1_S11_975090571p1 pd_11 ONLINE Good data_12 /dev/mapper/HDD_E0_S12_975082431p1 pd_12 ONLINE Good data_13 /dev/mapper/HDD_E0_S13_975087695p1 pd_13 ONLINE Good data_14 /dev/mapper/HDD_E1_S14_975098135p1 pd_14 ONLINE Good data_15 /dev/mapper/HDD_E1_S15_975277375p1 pd_15 ONLINE Good data_16 /dev/mapper/HDD_E0_S16_975053479p1 pd_16 ONLINE Good data_17 /dev/mapper/HDD_E0_S17_975101955p1 pd_17 ONLINE Good data_18 /dev/mapper/HDD_E1_S18_975105863p1 pd_18 ONLINE Good data_19 /dev/mapper/HDD_E1_S19_975100435p1 pd_19 ONLINE Good
[root@odarm1 ~]# ls -l /dev/mapper/HDD*
brw-rw---- 1 grid asmadmin 252, 26 Sep 10 19:17 /dev/mapper/HDD_E0_S00_975281119
brw-rw---- 1 grid asmadmin 252, 30 Oct 31 11:20 /dev/mapper/HDD_E0_S00_975281119p1 brw-rw---- 1 grid asmadmin 252, 33 Oct 31 10:30 /dev/mapper/HDD_E0_S00_975281119p2 brw-rw---- 1 grid asmadmin 252, 14 Oct 17 23:18 /dev/mapper/HDD_E0_S01_975092811 brw-rw---- 1 grid asmadmin 252, 27 Sep 10 19:17 /dev/mapper/HDD_E0_S04_975101159 brw-rw---- 1 grid asmadmin 252, 35 Oct 31 11:20 /dev/mapper/HDD_E0_S04_975101159p1 brw-rw---- 1 grid asmadmin 252, 37 Oct 31 10:30 /dev/mapper/HDD_E0_S04_975101159p2 ... ... brw-rw---- 1 grid asmadmin 252, 25 Sep 10 19:17 /dev/mapper/HDD_E1_S15_975277375 brw-rw---- 1 grid asmadmin 252, 34 Oct 31 11:20 /dev/mapper/HDD_E1_S15_975277375p1 brw-rw---- 1 grid asmadmin 252, 36 Oct 31 10:30 /dev/mapper/HDD_E1_S15_975277375p2 brw-rw---- 1 grid asmadmin 252, 20 Sep 10 19:17 /dev/mapper/HDD_E1_S18_975105863 brw-rw---- 1 grid asmadmin 252, 38 Oct 31 11:20 /dev/mapper/HDD_E1_S18_975105863p1 brw-rw---- 1 grid asmadmin 252, 39 Oct 31 10:30 /dev/mapper/HDD_E1_S18_975105863p2 brw-rw---- 1 grid asmadmin 252, 21 Sep 10 19:17 /dev/mapper/HDD_E1_S19_975100435 brw-rw---- 1 grid asmadmin 252, 48 Oct 31 11:20 /dev/mapper/HDD_E1_S19_975100435p1 brw-rw---- 1 grid asmadmin 252, 49 Oct 31 10:30 /dev/mapper/HDD_E1_S19_975100435p2
[root@odax1 mpath]# ls -altr 0 lrwxrwxrwx 1 root root 7 Dec 9 15:10 SSD_E0_S22_805852510 -> ../dm-2
0 lrwxrwxrwx 1 root root 7 Dec 9 15:10 SSD_E1_S22_805861570 -> ../dm-6 0 lrwxrwxrwx 1 root root 8 Dec 9 15:10 HDD_E1_S06_575232712 -> ../dm-18 0 lrwxrwxrwx 1 root root 8 Dec 9 15:10 HDD_E0_S04_373259068 -> ../dm-20 0 lrwxrwxrwx 1 root root 8 Dec 9 15:10 HDD_E1_S11_575233388 -> ../dm-24 ...
'fwupdate list disk' shows - the controller [root@oda1 ~]# fwupdate list disk
================================================== CONTROLLER ================================================== ID Type Manufacturer Model Product Name FW Version BIOS Version EFI Version FCODE Version Package Version NVDATA Version XML Support -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- c0 SAS LSI Logic 0x0072 SGX-SAS6-INT-Z 11.05.02.00 07.21.04.00 07.18.02.11 01.00.60.00 - 10.03.00.26 N/A DISKS =============== ID Manufacturer Model Chassis Slot Type Media Size (GB) FW Version XML Support ----------------------------------------------------------------------------------------------------------- c0d0 HITACHI H109060SESUN600G - 0 - HDD 600 A31A N/A c0d1 HITACHI H109060SESUN600G - 1 - HDD 600 A31A N/A ================================================== CONTROLLER ================================================== ID Type Manufacturer Model Product Name FW Version BIOS Version EFI Version FCODE Version Package Version NVDATA Version XML Support -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- c1 SAS LSI Logic 0x0072 SGX-SAS6-EXT-Z 11.05.02.00 07.21.04.00 07.18.02.07 01.00.60.00 - 10.03.00.24 N/A DISKS =============== ID Manufacturer Model Chassis Slot Type Media Size(GB) FW Version XML Support ----------------------------------------------------------------------------------------------------------- c1d0 HITACHI H109090SESUN900G 0 0 sas HDD 900 A31A N/A c1d1 HITACHI H109090SESUN900G 0 1 sas HDD 900 A31A N/A c1d2 HITACHI H109090SESUN900G 0 2 sas HDD 900 A31A N/A c1d3 HITACHI H109090SESUN900G 0 3 sas HDD 900 A31A N/A c1d4 HITACHI H109090SESUN900G 0 4 sas HDD 900 A31A N/A c1d5 HITACHI H109090SESUN900G 0 5 sas HDD 900 A31A N/A c1d6 HITACHI H109090SESUN900G 0 6 sas HDD 900 A31A N/A c1d7 HITACHI H109090SESUN900G 0 7 sas HDD 900 A31A N/A c1d8 HITACHI H109090SESUN900G 0 8 sas HDD 900 A31A N/A c1d9 HITACHI H109090SESUN900G 0 9 sas HDD 900 A31A N/A c1d10 HITACHI H109090SESUN900G 0 10 sas HDD 900 A31A N/A c1d11 HITACHI H109090SESUN900G 0 11 sas HDD 900 A31A N/A c1d12 HITACHI H109090SESUN900G 0 12 sas HDD 900 A31A N/A c1d13 HITACHI H109090SESUN900G 0 13 sas HDD 900 A31A N/A c1d14 HITACHI H109090SESUN900G 0 14 sas HDD 900 A31A N/A c1d15 HITACHI H109090SESUN900G 0 15 sas HDD 900 A31A N/A c1d16 HITACHI H109090SESUN900G 0 16 sas HDD 900 A31A N/A c1d17 HITACHI H109090SESUN900G 0 17 sas HDD 900 A31A N/A c1d18 HITACHI H109090SESUN900G 0 18 sas HDD 900 A31A N/A c1d19 HITACHI H109090SESUN900G 0 19 sas HDD 900 A31A N/A c1d20 STEC Z16IZF4EUSUN200G 0 20 sas SSD 200 9432 N/A c1d21 STEC Z16IZF4EUSUN200G 0 21 sas SSD 200 9432 N/A c1d22 STEC Z16IZF4EUSUN200G 0 22 sas SSD 200 9432 N/A c1d23 STEC Z16IZF4EUSUN200G 0 23 sas SSD 200 9432 N/A c1d24 HITACHI H109090SESUN900G 1 0 sas HDD 900 A31A N/A c1d25 HITACHI H109090SESUN900G 1 1 sas HDD 900 A31A N/A c1d26 HITACHI H109090SESUN900G 1 2 sas HDD 900 A31A N/A c1d27 HITACHI H109090SESUN900G 1 3 sas HDD 900 A31A N/A c1d28 HITACHI H109090SESUN900G 1 4 sas HDD 900 A31A N/A c1d29 HITACHI H109090SESUN900G 1 5 sas HDD 900 A31A N/A c1d30 HITACHI H109090SESUN900G 1 6 sas HDD 900 A31A N/A c1d31 HITACHI H109090SESUN900G 1 7 sas HDD 900 A31A N/A c1d32 HITACHI H109090SESUN900G 1 8 sas HDD 900 A31A N/A c1d33 HITACHI H109090SESUN900G 1 9 sas HDD 900 A31A N/A c1d34 HITACHI H109090SESUN900G 1 10 sas HDD 900 A31A N/A c1d35 HITACHI H109090SESUN900G 1 11 sas HDD 900 A31A N/A c1d36 HITACHI H109090SESUN900G 1 12 sas HDD 900 A31A N/A c1d37 HITACHI H109090SESUN900G 1 13 sas HDD 900 A31A N/A c1d38 HITACHI H109090SESUN900G 1 14 sas HDD 900 A31A N/A c1d39 HITACHI H109090SESUN900G 1 15 sas HDD 900 A31A N/A c1d40 HITACHI H109090SESUN900G 1 16 sas HDD 900 A31A N/A c1d41 HITACHI H109090SESUN900G 1 17 sas HDD 900 A31A N/A c1d42 HITACHI H109090SESUN900G 1 18 sas HDD 900 A31A N/A c1d43 HITACHI H109090SESUN900G 1 19 sas HDD 900 A31A N/A c1d44 STEC Z16IZF4EUSUN200G 1 20 sas SSD 200 9432 N/A c1d45 STEC Z16IZF4EUSUN200G 1 21 sas SSD 200 9432 N/A c1d46 STEC Z16IZF4EUSUN200G 1 22 sas SSD 200 9432 N/A c1d47 STEC Z16IZF4EUSUN200G 1 23 sas SSD 200 9432 N/A ================================================== CONTROLLER ================================================== ID Type Manufacturer Model Product Name FW Version BIOS Version EFI Version FCODE Version Package Version NVDATA Version XML Support -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- c2 SAS LSI Logic 0x0072 SGX-SAS6-EXT-Z 11.05.02.00 07.21.04.00 07.18.02.07 01.00.60.00 - 10.03.00.24 N/A DISKS =============== ID Manufacturer Model Chassis Slot Type Media Size(GB) FW Version XML Support ----------------------------------------------------------------------------------------------------------- c2d0 HITACHI H109090SESUN900G 0 0 sas HDD 900 A31A N/A c2d1 HITACHI H109090SESUN900G 0 1 sas HDD 900 A31A N/A c2d2 HITACHI H109090SESUN900G 0 2 sas HDD 900 A31A N/A c2d3 HITACHI H109090SESUN900G 0 3 sas HDD 900 A31A N/A c2d4 HITACHI H109090SESUN900G 0 4 sas HDD 900 A31A N/A c2d5 HITACHI H109090SESUN900G 0 5 sas HDD 900 A31A N/A c2d6 HITACHI H109090SESUN900G 0 6 sas HDD 900 A31A N/A c2d7 HITACHI H109090SESUN900G 0 7 sas HDD 900 A31A N/A c2d8 HITACHI H109090SESUN900G 0 8 sas HDD 900 A31A N/A c2d9 HITACHI H109090SESUN900G 0 9 sas HDD 900 A31A N/A c2d10 HITACHI H109090SESUN900G 0 10 sas HDD 900 A31A N/A c2d11 HITACHI H109090SESUN900G 0 11 sas HDD 900 A31A N/A c2d12 HITACHI H109090SESUN900G 0 12 sas HDD 900 A31A N/A c2d13 HITACHI H109090SESUN900G 0 13 sas HDD 900 A31A N/A c2d14 HITACHI H109090SESUN900G 0 14 sas HDD 900 A31A N/A c2d15 HITACHI H109090SESUN900G 0 15 sas HDD 900 A31A N/A c2d16 HITACHI H109090SESUN900G 0 16 sas HDD 900 A31A N/A c2d17 HITACHI H109090SESUN900G 0 17 sas HDD 900 A31A N/A c2d18 HITACHI H109090SESUN900G 0 18 sas HDD 900 A31A N/A c2d19 HITACHI H109090SESUN900G 0 19 sas HDD 900 A31A N/A c2d20 STEC Z16IZF4EUSUN200G 0 20 sas SSD 200 9432 N/A c2d21 STEC Z16IZF4EUSUN200G 0 21 sas SSD 200 9432 N/A c2d22 STEC Z16IZF4EUSUN200G 0 22 sas SSD 200 9432 N/A c2d23 STEC Z16IZF4EUSUN200G 0 23 sas SSD 200 9432 N/A c2d24 HITACHI H109090SESUN900G 1 0 sas HDD 900 A31A N/A c2d25 HITACHI H109090SESUN900G 1 1 sas HDD 900 A31A N/A c2d26 HITACHI H109090SESUN900G 1 2 sas HDD 900 A31A N/A c2d27 HITACHI H109090SESUN900G 1 3 sas HDD 900 A31A N/A c2d28 HITACHI H109090SESUN900G 1 4 sas HDD 900 A31A N/A c2d29 HITACHI H109090SESUN900G 1 5 sas HDD 900 A31A N/A c2d30 HITACHI H109090SESUN900G 1 6 sas HDD 900 A31A N/A c2d31 HITACHI H109090SESUN900G 1 7 sas HDD 900 A31A N/A c2d32 HITACHI H109090SESUN900G 1 8 sas HDD 900 A31A N/A c2d33 HITACHI H109090SESUN900G 1 9 sas HDD 900 A31A N/A c2d34 HITACHI H109090SESUN900G 1 10 sas HDD 900 A31A N/A c2d35 HITACHI H109090SESUN900G 1 11 sas HDD 900 A31A N/A c2d36 HITACHI H109090SESUN900G 1 12 sas HDD 900 A31A N/A c2d37 HITACHI H109090SESUN900G 1 13 sas HDD 900 A31A N/A c2d38 HITACHI H109090SESUN900G 1 14 sas HDD 900 A31A N/A c2d39 HITACHI H109090SESUN900G 1 15 sas HDD 900 A31A N/A c2d40 HITACHI H109090SESUN900G 1 16 sas HDD 900 A31A N/A c2d41 HITACHI H109090SESUN900G 1 17 sas HDD 900 A31A N/A c2d42 HITACHI H109090SESUN900G 1 18 sas HDD 900 A31A N/A c2d43 HITACHI H109090SESUN900G 1 19 sas HDD 900 A31A N/A c2d44 STEC Z16IZF4EUSUN200G 1 20 sas SSD 200 9432 N/A c2d45 STEC Z16IZF4EUSUN200G 1 21 sas SSD 200 9432 N/A c2d46 STEC Z16IZF4EUSUN200G 1 22 sas SSD 200 9432 N/A c2d47 STEC Z16IZF4EUSUN200G 1 23 sas SSD 200 9432 N/A
# cd /dev/mapper
[root@oda mapper]# ls -al By NAME -- helpful as this allows you to check the disks in order 0-23 ... crw------- 1 root root 10, 236 Dec 9 15:09 control brw-rw---- 1 grid asmadmin 252, 12 Dec 9 15:10 HDD_E0_S00_372932264 brw-rw---- 1 grid asmadmin 252, 64 Dec 13 01:30 HDD_E0_S00_372932264p1 brw-rw---- 1 grid asmadmin 252, 65 Dec 10 06:10 HDD_E0_S00_372932264p2 << Notice each HDD disk should list as [],p1 and p2 brw-rw---- 1 grid asmadmin 252, 13 Dec 9 15:10 HDD_E0_S01_373745920 brw-rw---- 1 grid asmadmin 252, 82 Dec 13 01:35 HDD_E0_S01_373745920p1 brw-rw---- 1 grid asmadmin 252, 83 Dec 10 06:10 HDD_E0_S01_373745920p2 ... ... brw-rw---- 1 grid asmadmin 252, 6 Dec 9 15:10 SSD_E1_S22_805861570 brw-rw---- 1 grid asmadmin 252, 96 Dec 9 15:10 SSD_E1_S22_805861570p1 << Notice each SSD disk should list as [],p1 only brw-rw---- 1 grid asmadmin 252, 7 Dec 9 15:10 SSD_E1_S23_805820183 brw-rw---- 1 grid asmadmin 252, 113 Dec 9 15:10 SSD_E1_S23_805820183p1l [root@oda mapper]# ls -altr By TIME -- helpful as this allows you to check the last disks being added during replacement / problem troubleshooting
... crw------- 1 root root 10, 236 Dec 9 15:09 control brw-rw---- 1 grid asmadmin 252, 7 Dec 2 15:10 SSD_E1_S23_805820183 brw-rw---- 1 grid asmadmin 252, 6 Dec 2 15:10 SSD_E1_S22_805861570 brw-rw---- 1 grid asmadmin 252, 5 Dec 2 15:10 SSD_E1_S21_805861578 ... ... brw-rw---- 1 grid asmadmin 252, 67 Dec 13 01:49 SSD_E0_S20_805852554p1 brw-rw---- 1 grid asmadmin 252, 48 Dec 13 01:49 SSD_E0_S22_805852510p1 brw-rw---- 1 grid asmadmin 252, 98 Dec 13 01:49 HDD_E0_S17_372466360p1
CHECK THE SHELF / JBOD ID, FW and if Primary or Secondary - may be useful for problems with Second JBOD installation issues
[root@oda1~]# fwupdate list expander
================================================== CONTROLLER ================================================== ID Type Manufacturer Model Product Name FW Version BIOS Version EFI Version FCODE Version Package Version NVDATA Version XML Support -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- c0 SAS LSI Logic 0x0072 SGX-SAS6-EXT-Z 11.05.03.00 07.21.09.00 07.22.05.00 01.00.62.00 - 10.03.00.32 N/A EXPANDERS =============== ID Chassis Slot Manufacturer Model Expander Name FW Version XML Support ------------------------------------------------------------------------------------------------------ c0x0 0 - ORACLE DE2-24P Primary 0018 N/A c0x1 1 - ORACLE DE2-24P Primary 0018 N/A ================================================== CONTROLLER ================================================== ID Type Manufacturer Model Product Name FW Version BIOS Version EFI Version FCODE Version Package Version NVDATA Version XML Support -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- c1 SAS LSI Logic 0x0072 SGX-SAS6-EXT-Z 11.05.03.00 07.21.09.00 07.22.05.00 01.00.62.00 - 10.03.00.32 N/A EXPANDERS =============== ID Chassis Slot Manufacturer Model Expander Name FW Version XML Support ------------------------------------------------------------------------------------------------------ c1x0 0 - ORACLE DE2-24P Secondary 0018 N/A c1x1 1 - ORACLE DE2-24P Secondary 0018 N/A [root@odax3rm1 ~]# mdadm --detail /dev/md0
oakcli validate -c OSDiskStorage root@odax3rm1-net1 ~]# oakcli validate -c OSDiskStorage
INFO: Checking Operating System Storage SUCCESS: The OS disks have the boot stamp RESULT: Logical Volume No volume groups found in Volume group is of size RESULT: Device /dev/xvda2 is mounted on / of type ext3 in (rw) RESULT: Device /dev/xvda1 is mounted on /boot of type ext3 in (rw) RESULT: Device /dev/xvdb1 is mounted on /u01 of type ext3 in (rw) RESULT: / has 31100 MB free out of total 55852 MB RESULT: /boot has 393 MB free out of total 460 MB RESULT: /u01 has 50489 MB free out of total 93868 MB
oakcli validate -c StorageTopology [root@odax3rm1-net1 ~]# oakcli validate -c StorageTopology It may take a while. Please wait... INFO : ODA Topology Verification INFO : Running on Node0 INFO : Check hardware type SUCCESS : Type of hardware found : X3-2 INFO : Check for Environment(Bare Metal or Virtual Machine) SUCCESS : Type of environment found : Virtual Machine(ODA BASE) SUCCESS : Number of External LSI SAS controller found : 2 INFO : Check for Controllers correct PCIe slot address SUCCESS : External LSI SAS controller 0 : 00:15.0 SUCCESS : External LSI SAS controller 1 : 00:16.0 INFO : Check if JBOD powered on SUCCESS : 2JBOD : Powered-on INFO : Check for correct number of EBODS(2 or 4) SUCCESS : EBOD found : 4 INFO : Check for External Controller 0 SUCCESS : Cable check for port 0 on controller 0 SUCCESS : Cable check for port 1 on controller 0 SUCCESS : Overall Cable check for controller 0 INFO : Check for External Controller 1 SUCCESS : Cable check for port 0 on controller 1 SUCCESS : Cable check for port 1 on controller 1 SUCCESS : Overall Cable check for controller 1 INFO : Check for overall status of cable validation on Node0 SUCCESS : Overall Cable Validation on Node0 SUCCESS : JBOD0 Nickname set correctly : Oracle Database Appliance - E0 SUCCESS : JBOD1 Nickname set correctly : Oracle Database Appliance - E1
[root@odax3rm1-net1 ~]# oakcli validate -c SharedStorage
[root@odax3rm1-net1 ~]# oakcli show validation storage failures
Show soft validation failures -- Nothing reported : Just a confirmation that the command was executed [root@odax3rm1-net1 ~]# Same system, different command [root@odax3rm1-net1 ~]# oakcli show storage -errors
ERROR: Disk e1_pd_23 [/dev/sdh] 35000a7203007d717@1327FM4013 belongs to another host's chassis#: 1252FM400F] Same system, shows SSD 23 for both JBODs
[root@odax3rm1-net1 ~]# grep S23 /opt/oracle/extapi/asmappl.config
disk /dev/mapper/SSD_E0_S23_805852551p1 0 23 1 disk /dev/mapper/SSD_E1_S23_805820183p1 1 23 1 < no evidence of a problem Yet, oakcli shows the same disk as removed [root@odax3rm1-net1 ~]# oakcli show diskgroup redo
ASM_DISK PATH DISK STATE STATE_DETAILS e0_redo_20 /dev/mapper/SSD_E0_S20_805852554p1 e0_pd_20 ONLINE Good e0_redo_21 /dev/mapper/SSD_E0_S21_805852541p1 e0_pd_21 ONLINE Good e0_redo_22 /dev/mapper/SSD_E0_S22_805852510p1 e0_pd_22 ONLINE Good e0_redo_23 /dev/mapper/SSD_E0_S23_805852551p1 e0_pd_23 ONLINE Good e1_redo_20 /dev/mapper/SSD_E1_S20_805861591p1 e1_pd_20 ONLINE Good e1_redo_21 /dev/mapper/SSD_E1_S21_805861578p1 e1_pd_21 ONLINE Good e1_redo_22 /dev/mapper/SSD_E1_S22_805861570p1 e1_pd_22 ONLINE Good e1_redo_23 /dev/mapper/SSD_E1_S23_805820183p1 e1_pd_23 FAILED DiskRemoved
References<NOTE:1519879.1> - ODA (Oracle Database Appliance) and ASM 2.1 up to 2.10 Storage Options for V1, X3-2 and X4-2 Hardware<NOTE:1435946.1> - How to Replace an ODA (Oracle Database Appliance) FAILED/ PredictiveFail Shared Storage Disk <NOTE:1496114.1> - ODA (Oracle Database Appliance): The Steps to replace multiple disks failing concurrently <NOTE:550569.1> - R12: Vertex Data - How to integrate <NOTE:1550569.1> - How to Troubleshoot OS disk issues on the Oracle Database Appliance <NOTE:1401471.1> - ODA After replacing a disk on Oracle Database Appliance the new disk is not added to ASM 2.1 to 2.4 <NOTE:1382300.1> - ODA (Oracle Database Appliance) : How to replace FAILED SYSTEM BOOT DISK <NOTE:1420126.1> - ODA (Oracle Database Appliance) Different Disks Randomly Disappear After a Reboot <NOTE:1990134.1> - Replace new disk failed at ODA (Oracle Database Appliance) 12 version <NOTE:1497610.1> - Determining when Disks should be replaced on Oracle Database Appliance <NOTE:1390058.1> - Oracle Database Appliance Diagnostic Information required for Disk Failures <NOTE:1457254.1> - ODA (Oracle Database Appliance): after disk failure some disks are in ASM mount_status 'CLOSED' <NOTE:1536486.1> - Replaced ODA drive lists as “UNKNOWN PARTIAL PathsNotLoaded” Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||
|