Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2129445.1
Update Date:2017-09-07
Keywords:

Solution Type  Technical Instruction Sure

Solution  2129445.1 :   VSM6 or VSM7 - How to Replace a Defective SAS HBA  


Related Items
  • StorageTek Virtual Storage Manager System 7 (VSM7)
  •  
  • StorageTek Virtual Storage Manager System 6 (VSM6)
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Virtual Tape>SN-TP: VSM6
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Need solution published
Created from <SR 3-12547222950>

Applies to:

StorageTek Virtual Storage Manager System 6 (VSM6) - Version All Versions and later
StorageTek Virtual Storage Manager System 7 (VSM7) - Version 7.0.0 and later
Information in this document applies to any platform.

Goal

VSM6 or VSM7: How to replace defective SAS HBA
 

Solution

DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: VSM6 trained, T4 server, Solaris 11
TIME ESTIMATE:120 minutes
• Approximately 90-120 minutes
TASK COMPLEXITY: 3

SAS HBA Card Replacement Steps Overview
1. Disable ASR
2. Check to see if the PCI card has been retired.
3. Shut down node and set for maintenance
4. Replace component(s)
5. Power system on into maintenance mode
6. Clear FMA faults
7. Reboot the node into normal mode and check the VTSS state
8. Enable ASR

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
PROBLEM OVERVIEW: How to replace SAS HBA card in a VSM6 or VSM7 server.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Prepare affected VSM6 or VSM7 server for servicing:

  1. Check with customer if RTD have dual path. If NO, you have to put RTD path offline for RTDs only attached to the node we need to stop.
  2. Verify the field task identifies which node and which SAS HBA within the node needs to be replaced.
    Logon information: ID = vsmadm (default password = vsm6admin).

    CAUTION: Be careful to complete all steps.

    Disable ASR on both nodes:

  3. Disable ASR:
    sudo /opt/vsm/bin/asr_sfb_cfg.pl ASR_DISABLE
    ssh other-vtss-node
    sudo /opt/vsm/bin/asr_sfb_cfg.pl ASR_DISABLE
    exit

    Check to see if the PCI card has been retired:

  4. See if the retire_store file exists:
    ls -l /etc/devices/retire_store

    If it exists, remove the file :
    sudo rm /etc/devices/retire_store

    Shutdown the node and set for maintenance:

  5. Use the CLI shutdown command to shut down the node number obtained from the query. This will shut down all the processes which were running on that node, set the node for maintenance upon restart, and then power off the node. The other node will remain operational.
    sudo /opt/vsm/bin/vsm_cli_client –c "shutdown node -maint –node X"
    Where X = the number of the node containing the suspect SAS HBA (1 or 2)

    Replace the SAS HBA:

  6. Per the SPARC T4-2 Server Service Manual, follow all ESD precautions, remove power cords, and replace fenced component. Extreme care must be exercised when sliding the server forward to avoid damaging any cables, connections, and/or connectors. Note that with the server is down, other fenced components can be replaced at the same time.
    SAS HBAs are installed in slot 3 and slot 6 in the VSM6.
    SAS HBAs are installed in slot 3 and slot 5 in the VSM7.

    Once the physical replacement is completed, carefully slide server back into rack and reconnect cables into their original positions.

    Power system on into maintenance mode:

  7. Per the SPARC T4-2 Server Service Manual, connect to ILOM and login to power node back on. Note that the node will operational, but will not be running VSM code upon restart.  Stated differently, log into the ILOM and start the system - start /SYS

    Clear FMA faults:

  8. If the UUID for the failure was provided in the field task clear FMA faults and confirm it is Resolved.
    If the UUID was not included in the field task check to see if one exists with the following command:
    sudo fmadm faulty

    Check the output of the above command for a UUID for the SAS HBA.

    Example of acquitting a UUID:
    sudo fmadm acquit 52457015-fc76-e975-c874-8dad3f6aa37c

    NOTE: Your ‘fmadm acquit’ command must use the UUID for the failure you are fixing, not the UUID given in the example above.

    fmadm: recorded acquittal of 52457015-fc76-e975-c874-8dad3f6aa37c

    sudo fmdump | grep 52457015-fc76-e975-c874-8dad3f6aa37c
    Jul 25 12:56:43.1716 52457015-fc76-e975-c874-8dad3f6aa37c PCIEX-8000-ND Diagnosed
    Jul 25 16:36:03.4187 52457015-fc76-e975-c874-8dad3f6aa37c FMD-8000-4M Repaired
    Jul 25 16:36:04.1891 52457015-fc76-e975-c874-8dad3f6aa37c FMD-8000-6U Resolved

    Reboot the node into normal mode and check the VTSS state:

  9. Reboot into normal mode:
    sync; sync; resetNode 2 1
  10. Verify the node comes back up and all resources come online and are owned by the correct node.
    /usr/cluster/bin/scstat -g
  11. Run a healthcheck and verify the VTSS is in good condition.
    sudo /opt/vsm/bin/vsm_HealthCheck

    Enable ASR on both nodes:

  12. Enable ASR:
    sudo /opt/vsm/bin/asr_sfb_cfg.pl ASR_ENABLE
    ssh other-vtss-node
    sudo /opt/vsm/bin/asr_sfb_cfg.pl ASR_ENABLE

WHAT ACTION DOES THE FE/CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Test functionality of system:
1. Run the Solaris "sudo fmadm faulty" and SP/ILOM "show faulty" command (if only ILOM is supported run "showfaults -v" command) to verify that the fault has been cleared.
2. Perform one of the following tasks based on your verification results:
* If the previous steps did not clear the fault, refer to doc 1004229.1 for information about the tools and methods you can use to diagnose and clear component faults.
* If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FE/CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

PARTS NOTE:

REFERENCE INFORMATION:

 

References

<NOTE:1471530.1> - VLE/VSM - Where to find Product Manuals, User Guides and Documentation
<NOTE:2079358.1> - VSM6 - One or more devices has a SAS path problem

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback