![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1615444.1 : In EECS 2.0.6.0.0 guest vServers intermittently not able to communicate (ping) each other over EoIB VLAN network
In this Document
Applies to:Oracle Exalogic Elastic Cloud Software - Version 2.0.6.0.0 to 2.0.6.0.0Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X3 [Release X3] Linux x86-64 Oracle Virtual Server(x86-64) PurposeSymptomsIn a fresh / upgraded EECS v2.0.6.0.0 installation of an Exalogic Rack, there is a strange problem among the guest vServers connectivity. Due to this, intermittently guest vServers cannot communicate with each other over EoIB/Ethernet interfaces. If problematic vServers are restarted, the inter-connectivity issues can be resolved to some vServers, but occur for previously contactable vServers. CauseThe cause of the issue, is that there are multiple gateway ports up and VNICs with the same GUID/MAC are up on multiple ports. There should only be one VNIC per switch for GUID/MAC combination. SolutionThe solution is to apply the patch from the unpublished bug: Bug 18175326 - INTERIM PATCH FOR BUGS 17294107, 17596353, 17934988, 18120474 FOR EL 2.0.6.0.0
The base bug fixes included in this patch are:
DetailsPlease file a Service Request with Oracle Customer Support to obtain the interim patch from the Bug 18175326. Note to EEST Support: This EM Ops Center (EMOC) patch conflicts with the following patch that provides support for NM2-GW FW v2.1.3-4: BUG 18034691 - CUMULATIVE PATCH FOR EXALOGIC CONTROL SERVICE (EL VIRTUAL ) Since the two patches are not compatible, customers requiring fixes included in both patches need to wait for the Exalogic April 2014 Patch Set Update. The following workaround procedure may be implemented until April 2014 PSU is available:
Interim Patch for the Bug 18175326The interim one-off patch for the bug 18175326, can be downloaded from the ARU 17312922. Note: Distribution of this ARU 17312922 is only 'By Support' to the Customers through a customer filed Service Request. The patch should be distributed to customer only with the approval from Exalogic Development team (please reach Dev Prabhu (dev.prabhu@oracle.com) for Approval with Customer Name, Region and SR number).
Workaround ProcedureThe following INTERNAL ONLY section of this note provides a description of the steps to implement a workaround if the customer is unable to apply patch 18175326 due to conflict with EMOC cumulative patch 18034691 which may be critical for the customer. This workaround procedure needs to be performed under support supervision: 1. Enter Maintenance Period Plan a maintenance window in order to perform this procedure. 2. Shut Down All Guest vServers This will be attempted in parallel in EMOC console to reduce time. 3. Generate list of ALL guest vServer VNICs on each switch Ensure list does NOT contain the VNICs used by the ELControl vServer. For example, determine ELControl VNIC from any Compute Node: $> (cd /OVS/Repositories/0004fb0000030000f8d9bef44e1586b8/VirtualMachines/;grep -c Control */* | grep -v :0 | cut -d: -f1 | xargs cat | egrep "simple_name|exalogic_vnic|uuid")
OVM_simple_name = 'ExalogicControlOpsCenterPC1' uuid = '0004fb0000060000302c15fd67d21624' expose_host_uuid = 1 exalogic_vnic = [{'pkey': [], 'guid': '0x88e22c1fdb58c20a', 'port': '1'}, {'pkey': [], 'guid': '0x88e22c1fdb58c20b', 'port': '2'}] OVM_simple_name = 'ExalogicControlOpsCenterPC2' uuid = '0004fb000006000095cffcd18478c782' expose_host_uuid = 1 exalogic_vnic = [{'pkey': [], 'guid': '0xe3a9e9709fd2425c', 'port': '1'}, {'pkey': [], 'guid': '0xe3a9e9709fd2425d', 'port': '2'}] OVM_simple_name = 'ExalogicControl' uuid = '0004fb0000060000c3637b689f90c079' expose_host_uuid = 1 exalogic_vnic = [{'pkey': ['0x8006'], 'guid': '0x9013963aa357ef63', 'port': '1'}, {'pkey': ['0x8006'], 'guid': '0x9013963aa357ef64', 'port': '2’}]
Note:
The ELControl vserver VNICs should NOT be deleted. But the user must ensure that they will exist on the single connector that will be eventually available. In this case, it is assumed this connector will be '0A-ETH-1' Generate list of VNICs to delete, without the ELControl VNICs [root@xxxib01 ~]# shownics | grep ETH | egrep -iv "9013963aa357ef63|9013963aa357ef64" | sed 's/^ *\([0-9]*\) .*:[0-9a-fA-F]* [0-9]* [0-9a-fA-F]* /\1/‘ > /tmp/vnics_to_delete
Sample output of above is: ...
180 0A-ETH-1 409 0A-ETH-1 601 0A-ETH-1 156 0A-ETH-1 771 0A-ETH-1 128 0A-ETH-1 317 0A-ETH-1 77 0A-ETH-1 449 0A-ETH-1 49 0A-ETH-1 93 0A-ETH-1 ... which is in format that deletevnic tool takes [root@enxl01sib001 ~]# deletevnic
Usage deletevnic connector vNIC_Id Example: deletevnic 0a-eth-1 1 Legal values for connector is: 0A-ETH-1, 0A-ETH-2, 0A-ETH-3, 0A-ETH-4, 0A-ETH,
1A-ETH-1, 1A-ETH-2, 1A-ETH-3, 1A-ETH-4, 1A-ETH,
4. Fix the Switch Configuration
Show VNICs delete commands: cat /tmp/vnics_to_delete | while read line;do echo "deleting: \"deletevnic $line\""; done
Actually delete them: cat /tmp/vnics_to_delete | while read line;do echo "deleting: \"deletevnic $line\””;deletevnic $line; done
NOTE: This will also remove any dead/orphaned vnics
Note:
The procedure in this document assumes that all but the connector 0A-ETH-1 will be disabled. If a different connector will become the sole active connection, any commands below should be modified appropriately.
same kind of command as above, except parsing and deleting VLANs instead of VNICs showvlan | grep ETH | grep -v "0A-ETH-1" | awk '{print $1" "$2}' | sed 's/ / -vlan /' | while read line;do echo "deleting: \"deletevlan $line\"";deletevlan $line;done
For example, the gw configuration on each switch is: [root@enxl01sib001 ~]# showgwports
INTERNAL PORTS: --------------- Device Port Portname PeerPort PortGUID LID IBState GWState --------------------------------------------------------------------------- Bridge-0 1 Bridge-0-1 4 0x0010e0300cfcc001 0x0006 Active Up Bridge-0 2 Bridge-0-2 3 0x0010e0300cfcc002 0x0007 Active Up Bridge-1 1 Bridge-1-1 2 0x0010e0300cfcc041 0x0008 Active Up Bridge-1 2 Bridge-1-2 1 0x0010e0300cfcc042 0x0009 Active Up CONNECTOR 0A-ETH: ----------------- Port Bridge Adminstate Link State Linkmode Speed ------------------------------------------------------------------------ 0A-ETH-1 Bridge-0-2 Enabled Up Up XFI 10Gb/s 0A-ETH-2 Bridge-0-2 Enabled Up Up XFI 10Gb/s 0A-ETH-3 Bridge-0-1 Enabled Down Reset XFI 10Gb/s 0A-ETH-4 Bridge-0-1 Enabled Down Reset XFI 10Gb/s CONNECTOR 1A-ETH: ----------------- Port Bridge Adminstate Link State Linkmode Speed ------------------------------------------------------------------------ 1A-ETH-1 Bridge-1-2 Enabled Up Up XFI 10Gb/s 1A-ETH-2 Bridge-1-2 Enabled Up Up XFI 10Gb/s 1A-ETH-3 Bridge-1-1 Enabled Down Reset XFI 10Gb/s 1A-ETH-4 Bridge-1-1 Enabled Down Reset XFI 10Gb/s Use disablegwport command for connectors 0A-ETH-2
1A-ETH-1 1A-ETH-2 command usage is: [root@enxl01sib001 ~]# disablegwport
Usage disablegwport connector Legal values for connector is: 0A-ETH-1, 0A-ETH-2, 0A-ETH-3, 0A-ETH-4, 0A-ETH,
1A-ETH-1, 1A-ETH-2, 1A-ETH-3, 1A-ETH-4, 1A-ETH, so we run, [root@enxl01sib001 ~]# disablegwport 0A-ETH-2
[root@enxl01sib001 ~]# disablegwport 1A-ETH-1 [root@enxl01sib001 ~]# disablegwport 1A-ETH-2
5. Select 3 any vServers to test the new configuration
6. Create 2 new vServers
7. Start all guest vServers
8. Leave Maintenance Window
References<BUG:18175326> - INTERIM PATCH FOR BUGS 17294107, 17596353, 17934988, 18120474 FOR EL 2.0.6.0.0<BUG:17934988> - PROBLEM WITH EOIB BETWEEN VSERVERS IN EXALOGIC 2.0.6.0.0 <BUG:17294107> - E2E: EMOC GENERATED HOSTNAMES CONTAIN UNDERSCORE -- AN ILLEGAL CHARACTER <BUG:17596353> - EL 1/2 RACK WITH 2 POOLS. NEED TO SET DISTRIBUTION GROUP TO HAVE ALL 16 NODES <BUG:18120474> - EOIB NETWORK ISSUES AND NOT ABLE TO PING VSERVER FROM ANOTHER ONE. Attachments This solution has no attachment |
||||||||||||||||||||||||||
|