Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1461219.1
Update Date:2017-05-18
Keywords:

Solution Type  Problem Resolution Sure

Solution  1461219.1 :   ODA HW: After SSD replacement, disk shows STATE as UNINITIALIZED and STATE_DETAILS as NewDiskInserted  


Related Items
  • Oracle Database Appliance Software
  •  
  • Oracle Database Appliance
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  


After disk replacement, disk shows STATE as  UNINITIALIZED  and STATE_DETAILS as NewDiskInserted

Created from <SR 3-5625470051>

Applies to:

Oracle Database Appliance - Version All Versions and later
Oracle Database Appliance Software - Version 2.1.0.1 and later
Linux x86-64

Symptoms


NAME           PATH           TYPE           STATE           STATE_DETAILS
----------------------------------------------------------------------------------------------
pd_00           /dev/sdam       HDD             ONLINE          Good          
...
pd_21           /dev/sdat       SSD             ONLINE           Good          
pd_22           /dev/sdaf       SSD             UNINITIALIZED    NewDiskInserted   <<<<
pd_23           /dev/sdah       SSD             ONLINE           Good          

  

Changes

 A Defective disk was replaced

Cause

There are currently several potential sources of the disk showing as uninitialized after disk replacement

This note will discuss some of the potential sources and corrective actions but should not be considered comprehensive.
 

Solution

Development has suggested the following two methods:

Reason #1.

After pulling or removing a problem disk, not waiting long enough before re-inserting the NEW disk.

A) Run the following on both nodes:

   1) multipath -F
   2) multipath -v2

   3) remove the disk again

   4) wait five minutes
   5) insert the disk back

   6) oakcli restart oak    -  on the first node
       oakcli restart oak    - on the second node

   7) oakcli show ismaster    < should show ASMASTER on the first Node and ASSLAVE on the second Node

  

The above does not need downtime, where:

   1) multipath -F    ==>  flush all unused multipath device maps
   2) multipath -v2   ==> print all info : detected paths, coalesced paths (ie  multipaths) and device maps)

 

NOTE:

It is possible that the metadata for the OLD disk still exists after a disk replacement fails.
There are a few key areas where this information can still exist including multipath.conf and asmappl.config.
If the old disk information still references the disk AND is anything other than group 0 more cleanup is required.

Comment:

 Group 0 means the disk is not associated with a working ODA group
 Groups 1,2,3 and for X5-2 diskgroup 4 are working ASM diskgroups

  

Note: Restart the oak process the following command  - Make sure that oak is working on both nodes.

On Node 1

   #oakcli restart oak

 
 On Node 2


   #oakcli restart oak


  wait 5 minutes and check the status of disks again

A simple diagnostic used to confirm the disk status and health is the oakcli STORDIAG command.

See Note 1497610.1 for usage


 

For more comprehensive information for assistance via an Service Request please use the following

 

1)      For X3-2, X4-2 or X5-2

       oakcli stordiag e#_pd_##   < Where E is for Enclosure 0 or 1  and pd is for the SLOT number (0-23)

For V1

       oakcli stordiag pd_##       < where ## is the SLOT #

 

2) oakcli manage diagcollect -storage

   Logs are collected to: /opt/oracle/oak/log/<nodeName>/oakdiag/oakStorage-<nodeName>-<YearMonthDay>_<HHMI>.tar.gz

 

Supplemental

 

3) Diskdiag   < very good for HW based diagnosis

4) Manual file collection
  - message filefrom both nodes including the time the disk was added
  - complete oakd.log file from both nodes
  - run "fwupdate list disk" and provide the output from both nodes

  - oakcli stordiag e#_pd_##   
                                                 Comment: for X5-2, X4-2, X3-2 :   e.g  e
                                                 where E is for Enclosure 0 or 1  and pd is for the SLOT number (0-23)   
                                                 For ODA V1 use  pd_## where ## is Slot number (0-23)
  - ASM alert.logs -- Each Node

 

COMMENT    -The following had previously been published but is now INTERNAL as this might cause CORRUPTION = Don't do it

 ... addasmdisk   << this has bugs in some older versions _especially if there is a SECOND JBOD
                              -- consult with ODA BDE prior to using unless on 12.1.2.8 or higher

. .."

Comment 1     Suggestion from Oracle support analyst:

"...Reboot the nodes before running any action plan wait for 5-10 min after reboot
   I have  rebooted the nodes and solved  everytime  this issue..."

 

-- Not always the case, but often a resolution if down time is not a problem

 

Comment 2     Please note, there is also an issue that can occur in special circumstances.

"...The customer in bug 16803770 - unable to get rid of uninitialized newdiskinserted
     changed the DATA/REDO allocation ratio, upgraded to 2.2, and somehow the patch
     did not update opt/oracle/oak/conf/oak_conf.xml correctly to that effect.  A typical
     sign of that would be from oakd.log:

2013-07-02 16:05:30.103: [pd_19][1082747200] {0:0:166} [resource_initialize] Invalid partition size
..."

Example of Partitions

  1. Check the oak_conf.xml file on both nodes to ensure the ratios for data/reco are customer specific settings and the not default values. The correct values are

    1. Local Backup
      # grep -A2  'data:'  /opt/oracle/oak/conf/oak_conf.xml
                          <Value>data:43:HDD</Value>
                          <Value>reco:57:HDD</Value>
                          <Value>redo:100:SSD</Value>
    2. External Backup
      # grep -A2  'data:'  /opt/oracle/oak/conf/oak_conf.xml
                         <Value>data:86:HDD</Value>  
                         <Value>reco:14:HDD</Value>
                         <Value>redo:100:SSD</Value>

  

 

Comment 3

This problem should be less likely to happen the newer the ODA version:

 "...ALWAYS check the ASM Alert.logs for both nodes and review to confirm that all diskgroups are available.
     A more serious condition can exist due to diskgroups being offlined beyond the redundancy capacity..."

  

 

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback