![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1496114.1 : ODA (Oracle Database Appliance): The Steps to replace multiple disks failing concurrently
Applies to:Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]Oracle Database Appliance Software - Version 2.1.0.1 to 2.9.0.0 [Release 2.1 to 2.9] Information in this document applies to any platform. GoalThe intention of this article is to describe which steps you should follow on replacing failing disks on ODA (Oracle Database Appliance) SolutionIn case one or more disks need to be replaced, you should follow the following steps: 1.Check the current disk statusCheck the current status of your shared storage disks from ASM & OAK perspective, make sure all other disks are in good shape and ASM redundancy will not be affected by the removal of the disk you intend to replace: - from an ASM point of view (login as user grid), you could issue the following command: export ORACLE_SID=+ASM1
asmcmd lsdsk -p The correct working disk status should be CACHED; MEMBER; ONLINE; NORMAL Check for negative Usable_file_MB, issuing the command: asmcmd lsdg
if Usable_file_MB is negative check reason: a) In case we have missing disks: - candidate disk for replacement: continue with procedure b) In case you have overallocated space you need to free some disk space in ASM, ie. remove archivelogs already backed up.
- from an OAK point of view issues the commands: oakcli show disk
and oakcli show diskgroup DATA
oakcli show diskgroup REDO oakcli show diskgroup RECO 2. Take a backup of your databases, cloud file systems (ACFS) in case something goes wrong3. Identify the failed diskIn order to identify the disk that needs to be replaced, issue the following command to turn on the LED on the disk ODA V1: oakcli locate disk pd_xx on
(where xx is the number in the range of 01 to 23) ODA X3-2 and higher oakcli locate disk eX_pd_xx on
(where X=0 or 1 and xx is the number in the range of 01 to 23) 4. Monitor the disk operationsOracle recommends to monitor the disk operations by tailing the ASM alertlog on both nodes during disk replacement: <node1> tail -f /u01/app/grid/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log
<node2> tail -f /u01/app/grid/diag/asm/+asm/+ASM2/trace/alert_+ASM2.log Look or events that the disk was removed, added and disk group rebalanced. 5. Pull-out the bad diskWait until the disk has been removed from ASM. You can verify that the disk has been "removed" with the following steps: fwupdate list disk
oakcli show disk pd_<slotnumber> oakcli show diskgroup grid> asmcmd lsdsk -p -t|grep <Slotnumber - Sxx> 6. Insert the new disk7. Check the status of the new inserted diskWait until the disk goes online, test with ODA V1: oakcli show disk pd_<slotnumber>
ie: oakcli show disk pd_16 ODA X3-2 and higher: oakcli show disk e<jbod_number>_pd_<slotnumber>
ie: oakcli show disk e0_pd_16 and grid> asmcmd lsdsk -p -t|grep <Slotnumber - Sxx>
If the disk does not go online after 5 minutes restart oak (login as root) and run oakcli restart oak
Update:
If the disk does not come online pull it out and insert again and check the status.
If the disk is not coming ONLINE you may want to try the following: 1- verify the disk it's not added to ASM
2- find the disk device 3- dd the initial disk area example: dd if=/dev/zero of=/dev/mapper/HDD_E1_S19_372682224 bs=8192 count=1000 4- remove it 5- wait for 3 mins 6- reinsert it again 7- wait for 3 mins 8- check the oak disk status: "oakcli show disk"
8. Check the ASM status of the new diskAs grid OS user, verify that the disk has been accepted by ASM, should be member or initializing grid> asmcmd lsdsk -p -t --member|grep <Slotnumber - Sxx>
Compare the disk number (path) with the slot number of the lsdsk output. Contact Support if these numbers do not match. DO NOT REPLACE MORE DISKS.
If disk was not added then add the disk manually using the disk name reported by the above asmcmd output: grid> sqlplus / as sysasm
SQL> alter diskgroup /*+ _OAK_AsmCookie */ DATA add disk '/dev/mapper/HDD_E0_S04_971463627p1' name HDD_E0_S04_971463627p1; SQL> alter diskgroup /*+ _OAK_AsmCookie */ RECO add disk '/dev/mapper/HDD_E0_S04_971463627p2' name HDD_E0_S04_971463627p2; If the command is not working properly contact Oracle Support. DO NOT CONTINUE WITH THE NEXT DISK UNTIL THE DISK IS ACCEPTED BY ASM!
9. Check ASM rebalance operationCheck in ASM that the rebalance is finished, in case you need to replace more than one disk grid> asmcmd lsdg (check if value for REBAL column is Y)
or executing the following query:
Rebalance the disk groups (optional if not started automatically by ASM) grid> asmcmd rebal DATA --power 11 -w (waits and prints rebalance complete when finished)
grid> asmcmd rebal RECO --power 11 -w When rebalance has finished continue with next disk. References<NOTE:1457254.1> - ODA (Oracle Database Appliance): after disk failure some disks are in ASM mount_status 'CLOSED'<NOTE:1382300.1> - ODA (Oracle Database Appliance) : How to replace FAILED SYSTEM BOOT DISK <NOTE:1534154.1> - Oracle Database Appliance FCO 0328 Disk Replacement Procedure Attachments This solution has no attachment |
||||||||||||
|