Asset ID: |
1-72-2257037.1 |
Update Date: | 2018-03-06 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2257037.1
:
PCA: Creating A Custom External Network failed due to one of Xsigo switch SSH daemon hang.
Related Items |
- Private Cloud Appliance X5-2 Hardware
|
Related Categories |
- PLA-Support>Eng Systems>Exalogic/OVCA>Oracle Virtual Compute Appliance>DB: OVCA_EST
|
In this Document
Created from <SR 3-14640823301>
Applies to:
Private Cloud Appliance X5-2 Hardware - Version All Versions and later
Information in this document applies to any platform.
Symptoms
On PCA 2.2.2, customer has recently patched in cables to port 2 on the Xsigo(F1-15 Fabric Interconnect) IO cards 4 and 5 and port 1 on IO cards 10 an 11, they did the same to both Xsigo switch ovcasw15r1 and ovcasw2r1. When they create a new custom network with the newly connected ports, the following error occurred:
PCA> create network TEST_PUB external_network '10:1 11:1'
Status: Failure
Error Message: Error (NETWORK_002): Exception while creating network: TEST_PUB. ["INVALID_SLOT_PORT_000: Invalid card slots or ports: ['\\xe2\\x80\\x9910:1']. Valid slots: ['4', '5', '10', '11']. Valid ports: ['1', '2', '3', '4'].", 'NOT_REQUIRED_NETWORK_ELEMENT_000: IP Prefix is not required for external_network network.']
PCA> create network TEST_PUB external_network 10:1 11:1
Status: Failure
rror Message: Error (NETWORK_002): Exception while creating network: TEST_PUB. ['NOT_REQUIRED_NETWORK_ELEMENT_000: IP Prefix is not required for external_network network.']
PCA> create network TEST_PUB external_network '10:1 11:1'
Status: Failure
Error Message: Error (NETWORK_002): Exception while creating network: TEST_PUB. Requires debugging of the two OFI and NM2 switches : IB subnet is not common across both IO directors
The list netowrk-port command shows that 4:2, 5:2,10:1, 11:1 is in down status on ovcasw15r1, while they are up on ovcasw22r1.
PCA> list network-port --filter-column Type --filter nwEthernet* --sorted-by State
Port Director Type State Networks
---- -------- ---- ----- --------
4:2 ovcasw15r1 nwEthernet10GbPort down None <<<<<
4:3 ovcasw15r1 nwEthernet10GbPort down None
4:4 ovcasw15r1 nwEthernet10GbPort down None
5:2 ovcasw15r1 nwEthernet10GbPort down None <<<<<
5:3 ovcasw15r1 nwEthernet10GbPort down None
5:4 ovcasw15r1 nwEthernet10GbPort down None
10:1 ovcasw15r1 nwEthernet10GbPort down None <<<<<
10:2 ovcasw15r1 nwEthernet10GbPort down None
10:3 ovcasw15r1 nwEthernet10GbPort down None
10:4 ovcasw15r1 nwEthernet10GbPort down None
11:1 ovcasw15r1 nwEthernet10GbPort down None <<<<<
11:2 ovcasw15r1 nwEthernet10GbPort down None
11:3 ovcasw15r1 nwEthernet10GbPort down None
11:4 ovcasw15r1 nwEthernet10GbPort down None
4:3 ovcasw22r1 nwEthernet10GbPort down None
4:4 ovcasw22r1 nwEthernet10GbPort down None
5:3 ovcasw22r1 nwEthernet10GbPort down None
5:4 ovcasw22r1 nwEthernet10GbPort down None
10:2 ovcasw22r1 nwEthernet10GbPort down None
10:3 ovcasw22r1 nwEthernet10GbPort down None
10:4 ovcasw22r1 nwEthernet10GbPort down None
11:2 ovcasw22r1 nwEthernet10GbPort down None
11:3 ovcasw22r1 nwEthernet10GbPort down None
11:4 ovcasw22r1 nwEthernet10GbPort down None
4:1 ovcasw15r1 nwEthernet10GbPort up mgmt_public_eth, vm_public_vlan
5:1 ovcasw15r1 nwEthernet10GbPort up mgmt_public_eth, vm_public_vlan
4:1 ovcasw22r1 nwEthernet10GbPort up mgmt_public_eth, vm_public_vlan
4:2 ovcasw22r1 nwEthernet10GbPort up None
5:1 ovcasw22r1 nwEthernet10GbPort up mgmt_public_eth, vm_public_vlan
5:2 ovcasw22r1 nwEthernet10GbPort up None
10:1 ovcasw22r1 nwEthernet10GbPort up None
11:1 ovcasw22r1 nwEthernet10GbPort up None
Tried to ping ovcasw15r1, it's successful, while SSH to it, it failed.
Changes
Cause
When creating a custom network, the ports designated in the 'create network' command have to be UP on both Xsigo switches for redundancy consideration, otherwise, the creation of customer network will fail.
Due to the PCA underlying services will fetch the IO ports status by SSH to the Xsigo switch, the SSH daemon hang on ovcasw15r1 cause the PCA corresponding services believe those newly connected ports are in down status.
These ports status should be retained in some Berkeley database, the PCA related service should refresh the IO ports status periodically. If the any Xsigo switch is not reachable through SSH, it simply doesn't update the database. That's why you can see that on ovcasw15r1, the previously in-factory connected ports 4:1, 5:1 are still in UP status, while the newly connected ports are in down status.
Solution
- Power cycle the Xsigo switch on which the SSH daemon hang.
- Then restart the ovca service stack through command 'service ovca restart' on master management node.
- Create the network again.
Customer first power cycled ovcasw15r1, then they can SSH to it.
- While creating network still failed. From the output of 'PCA> list network-port --filter-column Type --filter nwEthernet* --sorted-by State', we can see the newly connected IO ports still in down status.
- Failover the management node (restart the ovca service on master node should also work), now those newly connected IO ports are in UP status from the above command output.
- Then creating the network again, it gave the following error:
PCA> create network TEST_PUB external_network '10:1 11:1'
Status: Failure
Error Message: Error (NETWORK_002): Exception while creating network: TEST_PUB. 'Error setting status for row 10: data has been updated.'
PCA> create network NPNSP_PUB external_network '4:2 5:2'
Status: Failure
Error Message: Error (NETWORK_002): Exception while creating network: NPNSP_PUB. 'Error setting status for row 11: data has been updated.'
- Run command list network, they are not there.
PCA> list netwok
Network_Name Default Type
------------ ------- ----
vm_public_vlan True external_network
NSP_PRIV False rack_internal_network
NPNSP_PRIV False rack_internal_network
mgmt_pvi True rack_internal_network
mgmt_public_eth True external_network
vm_private True rack_internal_network
NPNSP_NFS False external_network
----------------
7 rows displayed
- But run command show network, it still show up.
PCA> shownetwork TEST_PUB
----------------------------------------
Network_Name TEST_PUB
Trunkmode True
Description User defined network
Ports ['10:1', '11:1']
vNICs None
Status error
Network_Type external_network
Compute_Nodes ovcacn08r1, ovcacn07r1, ovcacn09r1
Prefix None
Netmask None
Route Destination None
Route Gateway None
----------------------------------------
Status: Success
This is because although customer failed to create the network at the beginning, but still leave some rubbishy data.
- Run the following command to remove the network, then create it again, this time it's successful.
PCA> delete network TEST_PUB
PCA> create network TEST_PUB external_network '10:1 11:1'
Attachments
This solution has no attachment