![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1988829.1 : Infiniband HCA Replaced and Post Hardware Replacement IPMP bondib0 Group Is Marked As Failed
Applies to:Oracle SuperCluster T5-8 Full Rack - Version All Versions to All Versions [Release All Releases]SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases] Sun SPARC Sun OS SymptomsGrid Infrastructure fails to startup on Super Cluster DB Compute Node and underlying IB IPMP Information is not persistent post IB Card Replacement. Unable to see any IB Hosts after card replacement. ChangesInfiniband HCA Card (host channel adapter) Replacement Activity on Super Cluster DB Compute Nodes. CausePost the IB Card replacement upon node reboot Oracle Clusterware Stack fails to come online, issue is caused by IPMP bondib0 group having a status of failed when the HCA card failed. E.g.
root@dbnode01:~# ipmpstat -g GROUP GROUPNAME STATE FDT INTERFACES bondib0 bondib0 failed -- -- ---> This seems to indicate IPMP was there but its incorrect ! bondeth0 bondeth0 ok -- eth2 eth1
In Some cases presence of IB Partitions on the switch may cause this, however most deployments do not have this configured. You can verify this on every IB Switch. E.g # smpartition list active
# Sun DCS IB partition config file # This file is generated, do not edit #! version_number : 1 Default=0x7fff, ipoib : ALL_CAS=full, ALL_SWITCHES=full, SELF=full;
Solution1. If CRS is partially running or attempt to startup shut it down on the problem node. root@dbnode01:~# crsctl stop crs -f
root@dbnode02:~# ipmpstat -g
GROUP GROUPNAME STATE FDT INTERFACES bondib0 bondib0 ok -- bondib0_1 bondib0_0 -------> IPMP Group Config ! bondeth0 bondeth0 ok -- eth2 eth1 root@dbnode02:~# ipadm show-addr ADDROBJ TYPE STATE ADDR lo0/v4 static ok 127.0.0.1/8 eth0/v4 static ok 97.253.193.18/26 eth3/v4a static ok 172.17.80.239/22 bondeth0/v4 static ok 97.253.193.82/26 bondeth0/v4a static ok 97.253.193.90/26 bondib0/v4 static ok 192.168.2.200/26 -------> IP# of another node and its mask! net6/v4 static ok 169.254.182.77/24 lo0/v6 static ok ::1/128 eth3/v4 static disabled 97.253.193.146/26 eth3/bkp static disabled 172.17.80.239/22
We are only using the above information as a reference point so that we can correctly confirm IPMP group setup and its IP# along with its mask.
root@dbnode01:~# dladm create-part -l ib0 -P 0xffff bondib0_0
root@dbnode01:~# dladm create-part -l ib1 -P 0xffff bondib0_1 root@dbnode01:~# dladm show-part LINK PKEY OVER STATE FLAGS bondib0_0 FFFF ib0 unknown ---- bondib0_1 FFFF ib1 unknown ---- root@dbnode01:~# ipmpstat -g GROUP GROUPNAME STATE FDT INTERFACES bondib0 bondib0 failed -- -- bondeth0 bondeth0 ok -- eth2 eth1 root@dbnode01:~# ipadm delete-ip bondib0_0 root@dbnode01:~# ipadm show-if IFNAME CLASS STATE ACTIVE OVER lo0 loopback ok yes -- eth0 ip ok yes -- eth1 ip ok yes -- eth2 ip ok yes -- eth3 ip failed no -- bondeth0 ipmp ok yes eth1 eth2 bondib0_0 ip down no -- bondib0_1 ip down no -- root@dbnode01:~# ipadm create-ipmp bondib0 root@dbnode01:~# ipadm show-if IFNAME CLASS STATE ACTIVE OVER lo0 loopback ok yes -- eth0 ip ok yes -- eth1 ip ok yes -- eth2 ip ok yes -- eth3 ip failed no -- bondeth0 ipmp ok yes eth1 eth2 bondib0_0 ip ok yes -- bondib0_1 ip ok yes -- bondib0 ipmp down no bondib0_0 bondib0_1
You will now need to know the IP# to be used for this interface on the problem node, taken in step# 2
root@dbnode01:~# ipadm create-addr -T static -a local=192.168.2.199/26 bondib0/v4
root@dbnode01:~# ipadm show-if IFNAME CLASS STATE ACTIVE OVER lo0 loopback ok yes -- eth0 ip ok yes -- eth1 ip ok yes -- eth2 ip ok yes -- eth3 ip failed no -- bondeth0 ipmp ok yes eth1 eth2 bondib0_0 ip ok yes -- bondib0_1 ip ok yes -- bondib0 ipmp ok yes bondib0_0 bondib0_1
root@dbnode01:~# crsctl start crs Verify stack is healthy. Please note this change is persistent and will survive a reboot! References3-10313575601Attachments This solution has no attachment |
||||||||||||
|