Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-2037999.1
Update Date:2015-12-15
Keywords:

Solution Type  Troubleshooting Sure

Solution  2037999.1 :   Troubleshooting ACFS Repository/VM Mounting Issues on Oracle Database Appliance  


Related Items
  • Oracle Database Appliance
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  




Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Purpose

 This document provided step by step debugging for issues when ACFS shared repositories ae not getting mounted on DOM0 or ODA BASE resulting in 'oakcli show vm' and 'oakcli show repo' not listing VMs and repositories.

Troubleshooting Steps

1. First of all check the output of following commands from ODA BASE:

 

[root@xxx~]# oakcli show repo

 

NAME TYPE NODENUM FREE SPACE STATE SIZE

 

imp_sol shared 0 61.58% ONLINE 512000.0M <<<<<<<<
imp_sol shared 1 61.58% ONLINE 512000.0M <<<<<<<<

 

[root@xxxxx ~]# oakcli show vm

 

NAME NODENUM MEMORY VCPU STATE REPOSITORY

 

solwork1 1 4096M 3 ONLINE imp_sol <<<<<<<<
solwork2 1 4096M 3 ONLINE imp_sol <<<<<<<<
solwork3 1 4096M 3 ONLINE imp_sol <<<<<<<<

 

If it is not listing the required repository then:

 

2. Check if they are mounted on ODA BASE and DOM 0 at OS level:

[root@xxxx~]# df -kh
Filesystem Size Used Avail Use% Mounted on
/dev/xvda2 55G 44G 8.3G 85% /
/dev/xvda1 460M 43M 394M 10% /boot
/dev/xvdb1 92G 41G 47G 47% /u01
tmpfs 79G 1.2G 78G 2% /dev/shm
/dev/asm/acfsvol-151 50G 178M 50G 1% /cloudfs
/dev/asm/datafsvol-151
5.0G 87M 5.0G 2% /odadatafs
/dev/asm/datastore-402 180G 15G 166G 9% /u01/app/oracle/oradata/datastore
/dev/asm/imp_sol-151 500G 193G 308G 39% /u01/app/sharedrepo/imp_sol <<<<<<<<<<<<<<

Mostly the acfs volume will not be mounted either on ODA BASE or on DOM 0. They should be mounted on both.

 

3. Check the status of the resource in the CRS:

Make sure all the crs daemons are ONLINE:

 

 [root@XXXX bin]# ./crsctl stat res -t -init
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm 1 ONLINE ONLINE odax32in2 Started,STABLE
ora.cluster_interconnect.haip 1 ONLINE ONLINE odax32in2 STABLE
ora.crf 1 ONLINE ONLINE odax32in2 STABLE
ora.crsd 1 ONLINE ONLINE odax32in2 STABLE
ora.cssd 1 ONLINE ONLINE odax32in2 STABLE
ora.cssdmonitor 1 ONLINE ONLINE odax32in2 STABLE
ora.ctssd 1 ONLINE ONLINE odax32in2 ACTIVE:0,STABLE
ora.diskmon 1 OFFLINE OFFLINE STABLE
ora.drivers.acfs 1 ONLINE ONLINE odax32in2 STABLE
ora.evmd 1 ONLINE ONLINE odax32in2 STABLE
ora.gipcd 1 ONLINE ONLINE odax32in2 STABLE
ora.gpnpd 1 ONLINE ONLINE odax32in2 STABLE
ora.mdnsd 1 ONLINE ONLINE odax32in2 STABLE
ora.storage 1 ONLINE ONLINE odax32in2 STABLE

 

# crsctl stat res -t

 

...............

 

ora.RECO.IMP_SOL.advm 

 

ONLINE ONLINE odax32in1 Volume device /dev/asm/imp_sol-151 is online,STABLE
ONLINE ONLINE odax32in2 Volume device /dev/asm/imp_sol-151 is online,STABLE 

..........

 

ora.reco.imp_sol.acfs 

 

ONLINE ONLINE odax32in1 mounted on /u01/app/sharedrepo/imp_sol,STABLE
ONLINE ONLINE odax32in2 mounted on /u01/app/sharedrepo/imp_sol,STABLE

 

4. Make sure the ACFS volume is listing in asmcmd volinfo as the GRID user:

 

# asmcmd volinfo -a    ( on some versions this is --all )

 ...
 ...

Volume Name: IMP_SOL
Volume Device: /dev/asm/imp_sol-151
State: ENABLED
Size (MB): 512000
Resize Unit (MB): 64
Redundancy: HIGH
Stripe Columns: 8
Stripe Width (K): 1024
Usage: ACFS
Mountpath: /u01/app/sharedrepo/imp_sol

...............

 

5. Once it is listed, make sure it is registered in the acfsutil registry:

# /sbin/asfcutil registry -l

 [root@xxxx ~]# /sbin/acfsutil registry -l
........
Device : /dev/asm/imp_sol-151 : Mount Point : /u01/app/sharedrepo/imp_sol : Options : none : Nodes : all : Disk Group : RECO : Volume : IMP_SOL
.......

 

6. If it is not registered then add it:  

  for ex: /sbin/acfsutil registry -a -f /dev/asm/imp_sol-151 /u01/app/sharedrepo/imp_sol 

 

7. Once it is added, restart the master oakd daemon:

 

[grid@xxxxx~]$ oakcli show ismaster
OAKD is in Master Mode

if it is master

  [root@xxxxx bin]# oakcli restart oak
2015-07-29 21:14:36.128448389:[init.oak]:[Restarting the oakd..]
2015-07-29 21:14:36.187810018:[init.oak]:[Killing the running oakd with pid 78620]
2015-07-29 21:14:46.253558313:[init.oak]:[Successfully re-started the oakd..]

 

8. Still if the volumes are not getting mounted on dom0 and oda base, then review following logfiles:

 

From oda base:
/opt/oracle/oak/log/<node name>/odabaseagent/odabaseagent*
from dom0:
/opt/oracle/oak/log/<node name>-dom0/oakvmagent/*

 from ODA_BASE:

 $GRID_HOME/crsdata/<hostname>/acfs/* 

/sbin/acfsutil info fs

 9.dmesg | grep USM from both nodes.  Lately I saw few occasions that 1st node and 2nd node did not have the same USM versions; probably was due to the patching failed on one node without noticing or probably was due to re-apply gi patches.

10.  output of "/u01/app/12.1.0.2/grid/bin/acfsdriverstate -orahome /u01/app/12.1.0.2/grid 

11.  TFA can be used to collect acfs logs.  tfactl diagcollect -acfs -from "mmm/dd/yyyy hh:mm:ss" -to "mmm/dd/yyyy hh:mm:ss" 

 

Internal only:

Sometime we may need to dd the fs head for corruptions. For example:

    dd if=/dev/asm/datastore-466 bs=1024k count=200 | gzip -c  >/tmp/vol.200m_node1.gz  (example) 

    dd if=/dev/asm/volume-name bs=1024K count=200 | gzip -c > /tmp/dd.output  (general)

If needed, please also try to mount again and elevate the ACFS logging to 5:
   acfsutil log -l 5 -p ofs
   acfsutil log -l 5 -p ofs
   acfsutil log -l 5 -p avd

Reproduce problem

    cat /proc/oks/log > /tmp/oks.log

Then reset the log level back to 2:

   acfsutil log -l 2 -p ofs
   acfsutil log -l 2 -p ofs
   acfsutil log -l 2 -p avd 

Review the /tmp/oks.log. 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback