Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1605483.1
Update Date:2014-04-25
Keywords:

Solution Type  Technical Instruction Sure

Solution  1605483.1 :   How to determine Exadata DB or Storage Cell node location within a single or multi-rack configuration using the node name  


Related Items
  • Exadata X4-2 Hardware
  •  
  • Exadata X3-2 Hardware
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata X3-8 Hardware
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata X3-8b Hardware
  •  
  • Exadata Database Machine V2
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  
  • _Old GCS Categories>ST>Server>Engineered Systems>Exadata>Hardware
  •  


How to determine DB or Storage Cell node location within a single or multi-rack configuration using the node name

Oracle Confidential PARTNER - Available to partners (SUN).
Reason: There is no action for customers in this note

Applies to:

Exadata Database Machine V2 - Version All Versions and later
Exadata Database Machine X2-8 - Version All Versions and later
Exadata X3-8 Hardware - Version All Versions and later
Exadata X4-2 Hardware - Version All Versions and later
Exadata X3-8b Hardware - Version All Versions and later
Linux x86-64

Goal

How to determine Exadata DB or Storage Cell node location within a single or multi-rack configuration using the node name

When completing the ATR part details for a field task sometimes determining the node location is difficult using the node name.

Solution

In the sundiag output under the "net" directory there is a iblinkinfo file.  Using this file you can create a node name matrix which can be used to determine which node has had the fault.

There are a few examples I am using as examples, one was for a cell node and two are for a DB node.

Items to look for as you are reviewing the file:

  • After the nodename and before the IP address will be either a "C" or a "S",   A "C" is for a Cell (storage) node, a "S" is for a Database node.
  • A DB node can have multiple IB links.
  • I find it easier to take the contents of the iblinkinfo file and copy it into another file, then strip out all the duplicate and switch links.

With the storage cell node example there was a component that had to be replaced in a node name: nodeacx4052

Using the iblinkinfo file we can see all the nodes connected ( this was and X2-8):

           9   12[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>     26    1[  ]  "nodeacx4044 C 10.243.61.131 HCA-1" ( )
           9    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      38    1[  ] "nodeacx4043 C 10.243.61.130 HCA-1" ( ) <<< Cell node
           9    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6    1[  ] "nodeacx4046 C 10.243.61.133 HCA-1" ( )
           9    4[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       2    1[  ] "nodeacx4045 C 10.243.61.132 HCA-1" ( )
           9    5[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      16    1[  ] "nodeacx4048 C 10.243.61.135 HCA-1" ( )
           9    6[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      40    1[  ] "nodeacx4047 C 10.243.61.134 HCA-1" ( )
           9    7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      24    1[  ] "nodeadx4007 S 10.243.61.86 HCA-1" ( )
           9    8[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      42    1[  ] "nodeacx4049 C 10.243.61.136 HCA-1" ( )
           9    9[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      22    1[  ] "nodeadx4007 S 10.243.61.88 HCA-3" ( ) <<< DB node
           9   19[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      47    2[  ] "nodeacx4055 C 10.243.61.142 HCA-1" ( )
           9   20[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      15    2[  ] "nodeacx4056 C 10.243.61.143 HCA-1" ( )
           9   21[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      13    2[  ] "nodeacx4053 C 10.243.61.140 HCA-1" ( )
           9   22[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      19    2[  ] "nodeacx4054 C 10.243.61.141 HCA-1" ( )
           9   23[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5    2[  ] "nodeacx4051 C 10.243.61.138 HCA-1" ( )
           9   24[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      11    2[  ] "nodeacx4052 C 10.243.61.139 HCA-1" ( ) <<< Problem Cell node
           9   25[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      37    2[  ] "nodeadx4008 S 10.243.61.93 HCA-4" ( ) <<< DB Node
           9   26[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      45    2[  ] "nodeacx4050 C 10.243.61.137 HCA-1" ( )

There are 14 Storage cell nodes in each rack for a full rack.  The cell nodes are numbered 4043 (cell 01) to 4056 cell14)
The problem cell node would be cell node 10

On the second one, was a bad DIMM on a DB node system
4bsxd03, in a X3-8 that was multi-racked:


           3    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      25    2[  ] "system4bsxs02c C 10.196.128.18 HCA-1" ( ) <<<< Cell Node
           3    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      23    2[  ] "system4bsxs01c C 10.196.128.17 HCA-1" ( )
           3    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      30    2[  ] "system4bsxs04c C 10.196.128.20 HCA-1" ( )
           3    4[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      21    2[  ] system4bsxs03c C 10.196.128.19 HCA-1" ( )
           3    5[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      15    2[  ] "system4bsxs06c C 10.196.128.22 HCA-1" ( )
           3    6[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      28    2[  ] "system4bsxs05c C 10.196.128.21 HCA-1" ( )
           3    7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      41    2[  ] "system4bsxd01 S 10.196.128.1 HCA-1" ( ) <<<< Database Node
           3    8[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      13    2[  ] "system4bsxs07c C 10.196.128.23 HCA-1" ( )
...
           3   19[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      18    1[  ] "system4bsxs13c C 10.196.128.29 HCA-1" ( )
           3   20[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       6    1[  ] "system4bsxs14c C 10.196.128.30 HCA-1" ( )
           3   21[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      26    1[  ] "system4bsxs11c C 10.196.128.27 HCA-1" ( )
           3   22[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       4    1[  ] "system4bsxs12c C 10.196.128.28 HCA-1" ( )
           3   23[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      10    1[  ] "system4bsxs09c C 10.196.128.25 HCA-1" ( )
           3   24[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      16    1[  ] "system4bsxs10c C 10.196.128.26 HCA-1" ( )
           3   25[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      34    1[  ] "system4bsxd02 S 10.196.128.8 HCA-4" ( )
           3   26[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       8    1[  ] "system4bsxs08c C 10.196.128.24 HCA-1" ( )
  ...
          80    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      86    2[  ] "system4bsxs16c C 10.196.128.48 HCA-1" ( )
          80    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      51    2[  ] "system4bsxs15c C 10.196.128.47 HCA-1" ( )
          80    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      71    2[  ] "system4bsxs18c C 10.196.128.50 HCA-1" ( )
          80    4[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      77    2[  ] "system4bsxs17c C 10.196.128.49 HCA-1" ( )
          80    5[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      61    2[  ] "system4bsxs20c C 10.196.128.52 HCA-1" ( )
          80    6[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      59    2[  ] "system4bsxs19c C 10.196.128.51 HCA-1" ( )
          80    7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      88    2[  ] "system4bsxd03 S 10.196.128.31 HCA-1" ( ) <<<< Problem Node
          80    8[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      94    2[  ] "system4bsxs21c C 10.196.128.53 HCA-1" ( )
...
          80   19[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      66    1[  ] "system4bsxs27c C 10.196.128.59 HCA-1" ( )
          80   20[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      49    1[  ] "system4bsxs28c C 10.196.128.60 HCA-1" ( )
          80   21[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      62    1[  ] "system4bsxs25c C 10.196.128.57 HCA-1" ( )
          80   22[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      64    1[  ] "system4bsxs26c C 10.196.128.58 HCA-1" ( )
          80   23[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      91    1[  ] "system4bsxs23c C 10.196.128.55 HCA-1" ( )
          80   24[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      72    1[  ] "system4bsxs24c C 10.196.128.56 HCA-1" ( )
          80   25[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      76    1[  ] "system4bsxd04 S 10.196.128.38 HCA-4" ( )
          80   26[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      89    1[  ] "system4bsxs22c C 10.196.128.54 HCA-1" ( )
 
We can see from the file that there are 4 DB nodes, named system4bsxd01 to system4bsxd04.   We know that with a X2-8 or X3-8 configuration there are two DB nodes per chassis.   Nodes are numbered from the bottom up.  system4bsxd01 and system4bsxd02 are in rack 1, system4bsxd03 and system4bsxd04 are in the second rack - the problem node is DB node 01 in the second rack. 


In the third example is a multi-rack configuration which is a mix of X2-2 half rack and X3-2 half-rack to full-rack upgrade configurations. 

There is a DB node (sc_branch0012) with has a faulty component, In this example the customer named the nodes in a running number convention.

  ============================     Half Rack    ============================    

           2    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      15    2[  ] "sc-branch0644 C 192.168.10.43 HCA-1" ( )
           2    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       5    2[  ] "sc-branch0643 C 192.168.10.42 HCA-1" ( )
           2    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      17    2[  ] "sc-branch0646 C 192.168.10.45 HCA-1" ( )
           2    4[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      13    2[  ] "sc-branch0645 C 192.168.10.44 HCA-1" ( )
           2    5[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      27    2[  ] "sc-branch0648 C 192.168.10.47 HCA-1" ( )
           2    6[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       7    2[  ] "sc-branch0647 C 192.168.10.46 HCA-1" ( )
           2    7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      21    2[  ] "sc-branch0639 S 192.168.10.1 HCA-1" ( )
           2    8[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      11    2[  ] "sc-branch0649 C 192.168.10.48 HCA-1" ( )
           2    9[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      25    2[  ] "sc-branch0641 S 192.168.10.3 HCA-1" ( )
           2   10[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      23    2[  ] "sc-branch0640 S 192.168.10.2 HCA-1" ( )
           2   12[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      19    2[  ] "sc-branch0642 S 192.168.10.4 HCA-1" ( )

============================     Full Rack  (original half rack)   ============================
          68    1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      75    1[  ] "sc_branch0056 C 192.168.10.6 HCA-1" ( )
          68    2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      69    1[  ] "sc_branch0055 C 192.168.10.5 HCA-1" ( )
          68    3[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      54    1[  ] "sc_branch0058 C 192.168.10.8 HCA-1" ( )
          68    4[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      71    1[  ] "sc_branch0057 C 192.168.10.7 HCA-1" ( )
          68    5[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      52    1[  ] "sc_branch0060 C 192.168.10.10 HCA-1" ( )
          68    6[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      60    1[  ] "sc_branch0059 C 192.168.10.9 HCA-1" ( )
          68    7[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      62    1[  ] "sc_branch0010 S 192.168.10.1 HCA-1" ( )
          68    8[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      73    1[  ] "sc_branch0061 C 192.168.10.11 HCA-1" ( )
          68    9[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>       8    1[  ] "sc_branch0012 S 192.168.10.3 HCA-1" ( )  <<<< Problem Node
          68   10[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      58    1[  ] "sc_branch0011 S 192.168.10.2 HCA-1" ( )
          68   12[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      56    1[  ] "sc_branch0013 S 192.168.10.4 HCA-1" ( )

============================     Half Rack   (second half upgrade)  ============================ 
          68   19[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      31    2[  ] "sc-branch0374 C 192.168.10.40 HCA-1" ( )
          68   20[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      37    2[  ] "sc-branch0375 C 192.168.10.41 HCA-1" ( )
          68   21[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      29    2[  ] "sc-branch0372 C 192.168.10.38 HCA-1" ( )
          68   22[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      41    2[  ] "sc-branch0373 C 192.168.10.39 HCA-1" ( )
          68   23[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      43    2[  ] "sc-branch0370 C 192.168.10.36 HCA-1" ( )
          68   24[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      45    2[  ] "sc-branch0371 C 192.168.10.37 HCA-1" ( )
          68   25[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      39    2[  ] "sc-branch0368 S 192.168.10.34 HCA-1" ( )
          68   26[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      47    2[  ] "sc-branch0369 C 192.168.10.35 HCA-1" ( )
          68   27[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      35    2[  ] "sc-branch0366 S 192.168.10.32 HCA-1" ( )
          68   28[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      33    2[  ] "sc-branch0367 S 192.168.10.33 HCA-1" ( )
          68   30[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>      65    2[  ] "sc-branch0365 S 192.168.10.31 HCA-1" ( )
 

You can tell that this is the DB node 03 on the full rack.  The original naming convention used a "sc_branch" node name.  The new nodes were named "sc-branch".   The first half of the rack had the DB nodes numbered from 0010 to 0013.  The second half of the rack has the DB nodes 0365 to 0368.

Details to remember when doing this:

  • Know the Exadata model that has the problem, i.e.: X2-2, X3-8 ...
  • When reviewing the file you should be able to determine how many racks are involved and the size of the rack i.e.: quarter, half or full rack.
  • Know the layout of each type of rack:
    • nodes are numbered from the bottom up 
    • the number and quantity of node type to expect in each configuration, i.e.:  X3-2 half rack only has seven (7) Storage nodes and four (4) DB Nodes

 

This process can be a bit confusing going through the file and getting rid of duplicate node names, on a large multi-rack configuration this could be a mess.

You may want to use the grep command on the file located on the cores system.  

     Example: # grep nodeacx iblinkinfo.out  will output all of the storage cell nodes in the first example.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback