Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2325988.1
Update Date:2018-05-14
Keywords:

Solution Type  Technical Instruction Sure

Solution  2325988.1 :   How to Replace an Oracle Server X7-2 NVMe Disk  


Related Items
  • Oracle Server X7-2
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
 How to replace an NVME disk from Solaris Operating System
 How to replace an NVME disk from Oracle Linux Operating System
 How to replace an NVME disk from Microsoft Windows Server
 
REFERENCE INFORMATION:


Applies to:

Oracle Server X7-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace an Oracle Server X7-2 NVMe Disk.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Oracle Server X7-2 NVMe Disk needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

It is expected that the Oracle X7-2 Server is up and running and the failed drive is booted and available.

Before proceeding, confirm the part number of the part in hand (either from logistics or an on-site spare) matches the part number dispatched for replacement.

The following commands are provided as a guide in case the customer needs assistance checking the system prior to replacement. If the customer or FSE requires more assistance prior to the physical replacement of the device, X86 HW TSC should be contacted.

 

NVMe storage drives are supported only on X7-2 servers running the Oracle Solaris, Oracle Linux, Oracle VM, or Microsoft Windows Server. Servers that are running Red Hat Enterprise Linux do not support NVME drives.

How to replace an NVME disk from Solaris Operating System


Before you begin, the Solaris hotplug daemon must be enabled on the host

  NVMe Storage Drive Virtual PCIe Slot Designation

If NVMe storage drives are installed, they are labeled on the system front panel as NVMe0, NVMe1, NVMe2, NVMe3, NVMe4, NVMe5, NVMe6 and NVMe7. However, the server BIOS internally identifies these drives by their virtual PCIe slot numbers. When using operating system commands to power NVMe drives off before removal, you need to know the virtual PCIe slot number of the drive.

The following table lists the drive front panel label and its corresponding virtual PCIe slot number used by the operating system.

Front Panel Storage Drive Label
Virtual PCIe Slot Number
NVMe0
PCIe slot 100
NVMe1
PCIe slot 101
NVMe2
PCIe slot 102
NVMe3
PCIe slot 103
NVMe4
PCIe slot 104
NVMe5 PCIe slot 105
NVMe6 PCIe slot 106
NVMe7 PCIe slot 107

Note that the virtual PCIe slot name is not the same as the name on the server front panel label. The drive names provided in the table assume that NVMe cabling between the motherboard NVMe connectors and the disk backplane is correct.

  1. Log in to the Oracle Solaris host.
  2. Find the NVMe drive virtual PCIe slot number using the "hotplug list -lc" command: 
    # hotplug list -lc
    Connection State   Description Path                          
    -------------------------------------------------------------------------------------
    Slot100    ENABLED PCIe-Native /pci@13,0/pci8086,2030@0/pci111d,80b5@0/pci111d,80b5@5
  3. Prepare the NVMe drive for removal by powering off the drive slot using the "hotplug poweroff" command. In the following example we are powering off the NVMe drive in NVMe0 (pcie10)
    # hotplug poweroff Slot100

    You can see the drive is now powered off using the "hotplug list" command. The powered off drive will have a state of "present"

     # hotplug list -lc
    Connection State   Description Path                          
    -------------------------------------------------------------------------------------
    Slot100    PRESENT PCIe-Native /pci@13,0/pci8086,2030@0/pci111d,80b5@0/pci111d,80b5@5
  4. Verify that the blue OK to Remove indicator on the NVMe drive is lit.
  5. On the drive you plan to remove, push the latch release button to open the drive latch.
  6. Grasp the latch and pull the drive out of the drive slot.
  7. Verify that the NVMe drive has been removed, in the "hotplug list -lc" output the slot should now report "empty".
    # hotplug list -lc
    Connection State Description Path                          
    -----------------------------------------------------------------------------------
    Slot100    EMPTY PCIe-Native /pci@13,0/pci8086,2030@0/pci111d,80b5@0/pci111d,80b5@5
  8. Align the replacement drive with the drive slot.
  9. Slide the drive into the slot until the drive is fully seated.
  10. Close the drive latch to lock the drive in place.
  11. Power on the slot for the drive with the "hotplug enable" command.  (The system may perform this step automatically, if not then run this command)
    # hotplug enable Slot100

      Confirm that the drive has been enabled and is seen by the system.

    # hotplug list -lc
    Connection State   Description Path                          
    -------------------------------------------------------------------------------------
    Slot100    Enabled PCIe-Native /pci@13,0/pci8086,2030@0/pci111d,80b5@0/pci111d,80b5@5
  12. To check the NVMe drive health, firmware level, temperature, get error log, SMART data, low level format, etc., use the following nvmeadm commands.
    # nvmeadm list
    SUNW-NVME-1
    # nvmeadm getlog -h SUNW-NVME-1
    SUNW-NVME-1
    SMART/Health Information:
            Critical Warning: 0
            Temperature: 297 Kelvin
            Available Spare: 100 percent
            Available Spare Threshold: 10 percent
            Percentage Used: 0 percent
            Data Unit Read: 0x8e467e85 of 512k bytes.
            Data Unit Written: 0x28af3dbf of 512k bytes.
            Number of Host Read Commands: 0x5c7f318
            Number of Host Write Commands: 0x3c02fe4
            Controller Busy Time in Minutes: 0x3
            Number of Power Cycle: 0x488
            Number of Power On Hours: 0xe3e
            Number of Unsafe Shutdown: 0x484
            Number of Media Errors: 0x0
            Number of Error Info Log Entries: 0x0

 


How to replace an NVME disk from Oracle Linux Operating System

Linux NVMe hot plug requires the kernel boot argument "pci=pcie_bus_perf" be set in order to get proper MPS (MaxPayloadSize) and MRR (MaxReadRequest). Fatal errors will occur without this argument.

  1. Log in to Oracle Linux that is running on the server.
  2. Obtain information about available NVMe storage devices.
    1. Obtain the PCIe addresses (Bus Device Function) of enabled NVMe drives using the following command.  
      # find /sys/devices |egrep 'nvme[0-9][0-9]?$'
      /sys/devices/pci0000:85/0000:85:00.0/0000:86:00.0/nvme/nvme0
      /sys/devices/pci0000:85/0000:85:01.0/0000:8d:00.0/nvme/nvme1
      /sys/devices/pci0000:d7/0000:d7:02.0/0000:d9:00.0/nvme/nvme2
      /sys/devices/pci0000:d7/0000:d7:03.0/0000:e0:00.0/nvme/nvme3
    2.  Obtain the PCIe virtual slot number (APIC ID)   
      # egrep –H '.*' /sys/bus/pci/slots/*/address
      /sys/bus/pci/slots/0-1/address:0000:17:00
      /sys/bus/pci/slots/0-2/address:0000:d7:00
      /sys/bus/pci/slots/0-3/address:0000:01:00
      /sys/bus/pci/slots/0/address:0000:00:00
      /sys/bus/pci/slots/100-1/address:0000:19:00
      /sys/bus/pci/slots/100/address:0000:17:02
      /sys/bus/pci/slots/101-1/address:0000:20:00
      /sys/bus/pci/slots/101/address:0000:17:03
      /sys/bus/pci/slots/102-1/address:0000:9b:00
      /sys/bus/pci/slots/102/address:0000:85:03
      /sys/bus/pci/slots/103-1/address:0000:94:00
      /sys/bus/pci/slots/103/address:0000:85:02
      /sys/bus/pci/slots/104-1/address:0000:8d:00
      /sys/bus/pci/slots/104/address:0000:85:01
      /sys/bus/pci/slots/105-1/address:0000:86:00
      /sys/bus/pci/slots/105/address:0000:85:00
      /sys/bus/pci/slots/106-1/address:0000:e0:00
      /sys/bus/pci/slots/106/address:0000:d7:03
      /sys/bus/pci/slots/107-1/address:0000:d9:00
      /sys/bus/pci/slots/107/address:0000:d7:02
      /sys/bus/pci/slots/1/address:0000:ae:00
      /sys/bus/pci/slots/2/address:0000:3a:00
      /sys/bus/pci/slots/3/address:0000:5d:00
      /sys/bus/pci/slots/4/address:0000:5d:02
      /sys/bus/pci/slots/8191-1/address:0000:80:00
      /sys/bus/pci/slots/8191/address:0000:3a:02
       In the above output, notice that the instance names for the NVMe drives do not correspond to the NVMe drive labels on the front of the server, that is, pci/slots/105-1/address: 0000:86:00 corresponds to instance nvme0; however, on the front of the server, this drive is labeled NVMe5. 
  3. Remove the NVMe storage device path. Prepare the NVMe drive for removal.
    1. Use the umount command to unmount any file systems that are mounted on the device.
      In Linux, NVMe drives do not use the standard block device labeling, such as /dev/sd*. For example, NVMe drive 0 that has a single namespace block device would be /dev/nvme0n1. If you formatted and partitioned that namespace with a single partition, that would be /dev/nvme0n1p1.
    2.  Remove the device from any multiple device (md) and Logical Volume Manager (LVM) volume using it.If the device is a member of an LVM Volume group, then it may be necessary to move data off the device using the pvmove command, then use the vgreduce command to remove the physical volume, and (optionally) pvremove to remove the LVM meta data from the disk.
    3. If the device uses multipathing, run multipath -l and note all the paths to the device. Then, remove the multipathed device using the multipath -f device command.
    4. Run the blockdev --flushbufs device command to flush any outstanding I/O on all paths to the device
  4. Power off the NVMe slot with the command  "echo 0 > /sys/bus/pci/slots/slot_number/power"
    Where slot_number is the PCIe slot number obtained in step 2.b above.
    for example to power off the NVMe disk labeled NVMe5 (PCIe slot 105-1):
    # echo 0 > /sys/bus/pci/slots/105-1/power
  5. Verify that the blue OK to Remove indicator on the NVMe drive is lit.
  6. On the NVMe drive you plan to remove, push the latch release button to open the drive latch.
  7. Grasp the latch and pull the drive out of the drive slot.
  8. Verify that the NVMe drive has been removed. Type  
    # lspci -nnd :0a54
    8d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
    d9:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
    e0:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
  9. Note that address 86:00.0, which represents PCIe slot 105-1 and is the drive labeled NVMe5 on the system front panel and the drive powered off is not listed.
  10. After you physically remove an NVMe drive from the server, wait at least 10 seconds before installing a replacement drive.
  11. Align the replacement drive with the drive slot.
  12. Slide the drive into the slot until the drive is fully seated.
  13. Close the drive latch to lock the drive in place. 
  14. To power on the slot for the drive type "echo 1 > /sys/bus/pci/slots/slot_number/power" Where slot_number is the PCIe slot number assigned to the NVMe device slot (see step 4 above)
    for example to power on the newly installed NVMe disk NVMe5 (PCIe slot 105-1):
    # echo 1 > /sys/bus/pci/slots/105-1/power
  15. Confirm that the drive has been enabled and is seen by the system.
    1. Check the /var/log/messages log file.
    2. List available NVMe devices. Type: "ls -l /dev/nvme*"
    3. list the NVMe pci devices:   
      # lspci -nnd :0a54
      86:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
      8d:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
      d9:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]
      e0:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0a54]

 


How to replace an NVME disk from Microsoft Windows Server

NVMe storage drive hot plug is not supported for an Oracle Server X7-2 running Microsoft Windows Server. The system must be powered down before removing and replacing an NVMe storage drive.

  1. Power down the server that contains the storage drive to be removed.
  2. On the NVMe drive you plan to remove, push the latch release button to open the drive latch.
  3. Grasp the latch and pull the drive out of the drive slot.
  4. Align the replacement drive with the drive slot.
  5. Slide the drive into the slot until the drive is fully seated.
  6. Close the drive latch to lock the drive in place.
  7. Power on the server.


REFERENCE INFORMATION:

Refer to the Oracle Server X7-2 Service Manual or System Handbook for part information.

Oracle Server X7-2 Service Manual

Oracle System Handbook - Oracle Server X7-2

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback