How to Replace a Failed InfiniBand (HCA) Card on a Exalogic Storage Node (Sun ZFS Storage 7320)

Asset ID:	1-71-1385308.1
Update Date:	2018-05-09
Keywords:

Solution Type Technical Instruction Sure

Solution 1385308.1 : How to Replace a Failed InfiniBand (HCA) Card on a Exalogic Storage Node (Sun ZFS Storage 7320)

Applies to:

Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version Not Applicable to Not Applicable [Release N/A]
Sun Infiniband HCA - Version Not Applicable to Not Applicable [Release N/A]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replaced a Failed InfiniBand Card on a Exalogic Storage Node.

Solution

DISPATCH INSTRUCTIONS
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:

The FSE needs to be Exalogic Trained.

- TIME ESTIMATE: 60 minutes
- TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
- PROBLEM OVERVIEW:
- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE
RESOLUTION ACTIVITY?:

We will need to verify the storage cluster status and determine if the target storage node currently hosting the cluster resources and if so perform a failover.

- Login to target storage node.
- Verify cluster status, one node should report AKCS_OWNER and the other AKCS_STRIPPED. If this is not the case, DO NOT CONTINUE with this process until you resolve any cluster issues as failure to do so could result in a service outage.

Example of Active Storage Node (state = AKCS_OWNER):

sn01:> configuration cluster show
Properties:
                         state = AKCS_OWNER
                   description = Active (takeover completed)
                      peer_asn = d6df4e45-3677-4ac0-9aaa-90746df9d6a5
                 peer_hostname = sn02
                    peer_state = AKCS_STRIPPED
              peer_description = Ready (waiting for failback)

Children:
                        resources => Configure resources

Example of Passive Storage Node (state = AKCS_STRIPPED):

sn02:> configuration cluster show
Properties:
                         state = AKCS_STRIPPED
                   description = Ready (waiting for failback)
                      peer_asn = eeac79d6-5822-6ca6-e4dd-c68b25265f21
                 peer_hostname = sn01
                    peer_state = AKCS_OWNER
              peer_description = Active (takeover completed)

Children:
                        resources => Configure resources

- Determine if the target node is currently hosting the clustered resources (see previous step). If this is the case we have two options, shutdown the target node (will force resource failover to the alternate node) or to force a 'takeover' operation from the (AKCS_OWNER) node.

Example of Node shutdown:

sn01:> maintenance system poweroff <CR>
This will turn off power to the appliance. Are you sure? (Y/N)

Example of a forced Storage Node takeover:

On the current active node:
sn01:> maintenance system reboot 
This will reboot the appliance. Are you sure? (Y/N) 

Note: As per this document: "Failback" and "Takeover" Cluster Operation Supportability on ZFS Storage Appliance in Exalogic Rack (Doc ID 2091131.1)
the active node reboot is the method within Exalogic to affect takeovers.

Note: The reboot will force a 'takeover' operation that will result in failover of the clustered resources that comes from reboot of the original owner. You will either need to wait for the reboot to complete and shutdown the node or intercept the reboot operation and force a poweroff via ILOM (stop /SYS).

- If resource failover was required, check the alternate node to ensure the cluster resources migrated successfully (state should transition from AKCS_STRIPPED to AKCS_OWNER). Note: The reboot will force a 'takeover' operation that will result in failover of the clustered resources that comes from reboot of the original owner. You will either need to wait for the reboot to complete and shutdown the node or intercept the reboot operation and force a poweroff via ILOM (stop /SYS).

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

Please see the "PCIe Cards and Risers" replacement procedures located in "Sun ZFS Storage 7x20 Appliance Customer Service Manual".

- Power-off the target node for service.
- Pull the InfiniBand Cables from the IB Card at the rear of the server.
- Transition the target node to the service position.
- Remove the top cover.
- Locate and Remove the PCIe Riser that includes the IB Card.
- Remove and Replace the defective IB Card.
- Re-Install the PCIe Riser.
- Install the top cover.
- Slide the node back into the rack operating position.
- Power on the system either via ILOM or via the push button on the front of the server.

Update the partition map on the Infiniband Master switch with the new IB-HCA’s GUID’s

1. Determine the port GUID’s of the new IB-HCA with the ibstat command run on the Storage node.
Record the “Port GUID” that is returned for both ports

Example:
# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.7.8130
Hardware version: b0
Node GUID: 0x0021280001ef5d22
System image GUID: 0x0021280001ef5d25
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 32
LMC: 0
SM lid: 8
Capability mask: 0x02510868
Port GUID: 0x0021280001ef5d23
Link layer: IB
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 33
LMC: 0
SM lid: 8
Capability mask: 0x02510868
Port GUID: 0x0021280001ef5d24
Link layer: IB

2. Determine which Gateway switch is the master using command 'getmaster'

3. Login to the Master switch, this is the only place the partition will be modified. It will be propagated once committed.

4. Add port guids of the new IB-HCA to the partitions that being used by the node.
Here is an example for partition 503 (or 8503) (pkey) only on master switch

# smpartition start
# smpartition add -pkey 0x503 -port 21280001ef5d24 21280001ef5d23 -m both

NOTE: The -m switch sets the mode needed for the partition. If there is a default mode configured on the partition and that’s what these GUID’s will be using, you don’t need to use the -m however it will not cause any issues if you do. If there is NO default configured on the partition you will need to set this to one of the following:

both, limited, or full

NOTE: Add the two new Port GUID’s to all the partitions that are needed

You want to remove the port GUID’s of the card that is being removed form the partition maps. The port GUID’s of the faulty card are the “Node GUID of the card +1 and Node GUID of the card +2”. The Node GUID can be obtained with the ibstat command prior to removing the card or is printed on a lable on the card itself which can be read if it has already been removed.

You can remove these GUID’s from the active partitions using the following command:

Example for a card with Node GUID 21280001ef1233

# smpartition remove -pkey 0x503 -port 21280001ef1234 21280001ef1235

NOTE: Remove the GUID’s for all partitions the faulty card was a member of.

Verify the partition information is correct:

# smpartition list modified

If everything is correct:

# smpartition commit

4. verify entries using

# smpartition list active

0x0021280001ef5d24=both,

0x0021280001ef5d23=both,

Ensure the GUID’s are in all the partitions it will be required to communicate in.

OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

The customer is responsible to verify the new component is functioning correctly, some steps the customer may want to use for verification are as follows.

In order to verify the operation of the newly replaced IB Card the customer will need to takeover the clustered resources from the current owner and test access to the storage appliance.

Here are some sugestions for basic verification

Basic Testing:

- Login to the target node.
- Perform a 'takeover' operation for the clustered resources. This should be done from the node that was serviced (replaced IB Card).

sn01:configuration cluster> takeover <CR>
Continuing will immediately fail back the resources assigned to the cluster
peer. This may result in clients experiencing a slight delay in service.

Are you sure? (Y/N)

- Verify all resources migrated as expected.

sn01:> configuration cluster show
Properties:
                         state = AKCS_OWNER
                   description = Active (takeover completed)
                      peer_asn = d6df4e45-3677-4ac0-9aaa-90746df9d6a5
                 peer_hostname = sn02
                    peer_state = AKCS_STRIPPED
              peer_description = Ready (waiting for failback)

Children:
                        resources => Configure resources

Note: The steps above constitute a basic test assuming the clustered resources failed back normally. The state reported in the above example depicts the normal operating status of a Storage Cluster within a Exalogic as seen from the Active node.

Additional Testing:

- Execute the following commands from the storage node to verify all datalinks and interfaces are up.

sn01:> configuration net datalinks show
Datalinks:

DATALINK    CLASS          LINKS       STATE   LABEL
igb0        device         igb0        up      igb0
igb1        device         igb1        up      igb1
ibp0 device ibp0 up ibp0 ibp1 device ibp1 up ibp1 
sn01:> configuration net interfaces show
Interfaces:


INTERFACE   STATE    CLASS LINKS       ADDRS                  LABEL
igb0        up       ip    igb0        10.10.10.10/24         igb0
igb1        offline  ip    igb1        10.10.10.11/24         igb1
ipmp1 up ipmp ibp0 192.168.10.15/24 IB_Interface ibp1 ibp0 up ip ibp0 0.0.0.0/8 ibp0 ibp1 up ip ibp1 0.0.0.0/8 ibp1

- In order to completely test both InfiniBand ports on the IB card you could alternate disabling each physical interface one at a time or physically disconnect each port one at a timewhile monitoring access to the appliance.

Test physical links in IPMP group:

sn01:> configuration net interfaces 
sn01:configuration net interfaces> show
Interfaces:

INTERFACE   STATE    CLASS LINKS       ADDRS                  LABEL
igb0        up       ip    igb0        10.10.10.10/24         igb0
igb1        offline  ip    igb1        10.10.10.11/24         igb1
ipmp1       up       ipmp  ibp0        192.168.10.15/24       IB_Interface
                           ibp1                         
ibp0        up       ip    ibp0        0.0.0.0/8              ibp0
ibp1        up       ip    ibp1        0.0.0.0/8              ibp1

sn01:configuration net interfaces> select ibp0
sn01:configuration net interfaces ibp0> show
Properties:
                         state = up
                      curaddrs = 0.0.0.0/8
                         class = ip
                         label = ibp0
                        enable = true
                         admin = true
                         links = ibp0
                       v4addrs = 0.0.0.0/8
                        v4dhcp = false
                       v6addrs = 
                        v6dhcp = false

sn01:configuration net interfaces ibp0> set enable=false
                        enable = false (uncommitted)
sn01:configuration net interfaces ibp0> commit
sn01:configuration net interfaces ibp0> cd ..
sn01:configuration net interfaces> show
Interfaces:

INTERFACE   STATE    CLASS LINKS       ADDRS                  LABEL
igb0        up       ip    igb0        10.10.10.10/24         igb0
igb1        offline  ip    igb1        10.10.10.11/24         igb1
ipmp1       up       ipmp  ibp0        192.168.10.15/24       IB_Interface
                           ibp1                               
ibp0        disabled ip    ibp0        0.0.0.0/8              ibp0
ibp1        up       ip    ibp1        0.0.0.0/8              ibp1

sn01:configuration net interfaces> select ibp0
sn01:configuration net interfaces ibp0> set enable=true
                        enable = true (uncommitted)
sn01:configuration net interfaces ibp0> commit
sn01:configuration net interfaces ibp0> cd ..
sn01:configuration net interfaces> show
Interfaces:

INTERFACE   STATE    CLASS LINKS       ADDRS                  LABEL
igb0        up       ip    igb0        10.10.10.10/24         igb0
igb1        offline  ip    igb1        10.10.10.11/24         igb1
ipmp1       up       ipmp  ibp0        192.168.10.15/24       IB_Interface
                           ibp1                         
ibp0        up       ip    ibp0        0.0.0.0/8              ibp0
ibp1        up       ip    ibp1        0.0.0.0/8              ibp1

sn01:configuration net interfaces>

Note: Perform the same set of commands above for interface "ibp1" as well. This will test both links in the IPMP group "ipmp1".

- Login to one or more compute nodes and verify access to the storage appliance over the InfiniBand Network. Depending on the level of testing desired you could perform one or more of the test from the following list:

ping test
verify access to storage appliance shares
copy files between compute node and storage node

PARTS NOTE:

REFERENCE INFORMATION:

Exalogic Machine Owner's Guide: https://docs.oracle.com/cd/E18476_01/index.htm

Sun ZFS Storage 7320 System Documentation: https://docs.oracle.com/cd/E22471_01/index.html

Sun ZFS Storage 7x20 Appliance Customer Service Manual: https://docs.oracle.com/cd/E22471_01/html/821-1792/index.html

References

<NOTE:2091131.1> - "Failback" and "Takeover" Cluster Operation Supportability on ZFS Storage Appliance in Exalogic Rack

Attachments

This solution has no attachment