Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2034530.1
Update Date:2018-05-14
Keywords:

Solution Type  Technical Instruction Sure

Solution  2034530.1 :   How to Replace an Oracle Server X5-2L, X6-2L NVMe Disk [VCAP]  


Related Items
  • Oracle Server X6-2L
  •  
  • Oracle Server X5-2L
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
 NVMe Storage Drive Virtual PCIe Slot Designation:


Applies to:

Oracle Server X5-2L - Version All Versions to All Versions [Release All Releases]
Oracle Server X6-2L - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace an Oracle Server X5-2L, X6-2L NVMe Disk.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Oracle Server X5-2L, X6-2L NVMe Disk needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

It is expected that the Oracle Server X5-2L/X6-2L is up and running and the failed drive is booted and available.

Before proceeding, confirm the part number of the part in hand (either from logistics or an on-site spare) matches the part number dispatched for replacement.

The following commands are provided as a guide in case the customer needs assistance checking the system prior to replacement. If the customer or FSE requires more assistance prior to the physical replacement of the device, X86 HW TSC should be contacted.

 

NVMe storage drives are only supported on X5-2L/X6-2L servers running the Oracle Solaris or Oracle Linux operating system.


How to replace an NVME disk from Solaris Operating System:


Before you begin, the Solaris hotplug daemon must be enabled on the host.

NVMe Storage Drive Virtual PCIe Slot Designation:

If NVMe storage drives are installed, they are labeled on the system front panel as NVMe0, NVMe1, NVMe2, and NVMe3. However, the server BIOS internally identifies these drives by their virtual PCIe slot numbers. When using operating system commands to power NVMe drives off before removal, you need to know the virtual PCIe slot number of the drive.

The following table lists the drive front panel label and its corresponding virtual PCIe slot number used by the operating system.

Front Panel Storage Drive Label
Virtual PCIe Slot Number
NVMe0 (HDD3)
PCIe slot 10
NVMe1 (HDD4)
PCIe slot 11
NVMe2 (HDD19)
PCIe slot 12
NVMe3 (HDD20)
PCIe slot 13

Note that the virtual PCIe slot name is not the same as the name on the server front panel label.

1.  Log in to the Oracle Solaris host.

2.  Find the NVMe drive virtual PCIe slot number using the "hotplug list -lc" command: 

# hotplug list -lc
Connection           State           Description
Path                          
________________________________________________________________________________
pcie12               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@4
pcie13               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@5
pcie11               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@6
pcie10               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@7

3.  Prepare the NVMe drive for removal by powering off the drive slot using the "hotplug poweroff" command. In the following example we are powering off the NVMe drive in slot0 (pcie10)

# hotplug poweroff pcie10

You can see the drive is now powered off using the "hotplug list" command. The powered off drive will have a state of "present"

 # hotplug list -lc
Connection           State           Description
Path                          
________________________________________________________________________________
pcie12               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@4
pcie13               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@5
pcie11               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@6
pcie10               PRESENT         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@7

4,  Verify that the blue OK to Remove indicator on the NVMe drive is lit.

5.  On the drive you plan to remove, push the latch release button to open the drive latch.

6.  Grasp the latch and pull the drive out of the drive slot.

7.  Verify that the NVMe drive has been removed, in the "hotplug list -lc" output the slot should now report "empty".

# hotplug list -lc
Connection           State           Description
Path                          
________________________________________________________________________________
pcie12               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@4
pcie13               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@5
pcie11               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@6
pcie10               EMPTY           PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@7

8.  Align the replacement drive with the drive slot.

9.  Slide the drive into the slot until the drive is fully seated.

10.  Close the drive latch to lock the drive in place.

11.  Power on the slot for the drive with the "hotplug enable" command.  (The system may perform this step automatically, if not then run this command)

# hotplug enable pcie10

12.  Confirm that the drive has been enabled and is seen by the system.

# hotplug list -lc
Connection           State           Description
Path                          
________________________________________________________________________________
pcie12               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@4
pcie13               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@5
pcie11               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@6
pcie10               ENABLED         PCIe-Native
/pci@7a,0/pci8086,2f08@3/pci111d,80b5@0/pci111d,80b5@7

13. To check the NVMe drive health, firmware level, temperature, get error log, SMART data, low level format, etc., use the following nvmeadm commands-

# nvmeadm list
SUNW-NVME-1
SUNW-NVME-2
SUNW-NVME-3
SUNW-NVME-4
# nvmeadm getlog -h SUNW-NVME-1
SUNW-NVME-1
SMART/Health Information:
        Critical Warning: 0
        Temperature: 299 Kelvin
        Available Spare: 100 percent
        Available Spare Threshold: 10 percent
        Percentage Used: 0 percent
        Data Unit Read: 0x1a0c67dd of 512k bytes.
        Data Unit Written: 0x4500b024 of 512k bytes.
        Number of Host Read Commands: 0x84bf07a
        Number of Host Write Commands: 0x47f8344
        Controller Busy Time in Minutes: 0x3
        Number of Power Cycle: 0x55
        Number of Power On Hours: 0x11f1
        Number of Unsafe Shutdown: 0x51
        Number of Media Errors: 0x0
        Number of Error Info Log Entries: 0x0

 


 
How to replace an NVME disk from Oracle Linux Operating System

Linux NVMe hot plug requires the kernel boot argument "pci=pcie_bus_perf" be set in order to get proper MPS (MaxPayloadSize) and MRR (MaxReadRequest). Fatal errors will occur without this argument.

  1. Log in to Oracle Linux that is running on the server.
  2. Obtain information about available NVMe storage devices.
    1. Obtain the PCIe addresses (Bus Device Function) of enabled NVMe drives using the following command.  
      # find /sys/devices |egrep 'nvme[0-9][0-9]?$'
      /sys/devices/pci0000:80/0000:80:03.0/0000:b0:00.0/0000:b1:04.0/0000:b2:00.0/misc/nvme0
      /sys/devices/pci0000:80/0000:80:03.0/0000:b0:00.0/0000:b1:05.0/0000:b4:00.0/misc/nvme1
      /sys/devices/pci0000:80/0000:80:03.0/0000:b0:00.0/0000:b1:06.0/0000:b6:00.0/misc/nvme2
      /sys/devices/pci0000:80/0000:80:03.0/0000:b0:00.0/0000:b1:07.0/0000:b8:00.0/misc/nvme3
    2.  Obtain the PCIe virtual slot number (APIC ID)   
      # egrep -H '.*' /sys/bus/pci/slots/1?/address
      /sys/bus/pci/slots/10/address:0000:b8:00
      /sys/bus/pci/slots/11/address:0000:b6:00
      /sys/bus/pci/slots/12/address:0000:b2:00
      /sys/bus/pci/slots/13/address:0000:b4:00
       For example, the PCIe address 0000:b8:00.0 matches the PCIe slot number (10) for the drive labeled NVMe0 on the system front panel.
    3. Obtain the NVME storage device path  
      # parted -l | grep nvme
      Disk /dev/nvme0n1: 1600GB
      Disk /dev/nvme1n1: 1600GB
      Disk /dev/nvme2n1: 1600GB
      Disk /dev/nvme3n1: 1600GB
       The devices correspond to the physical slots as follows:
      /dev/nvme0n1 - NVMe0
      /dev/nvme1n1 - NVMe1
      /dev/nvme2n1 - NVMe2
      /dev/nvme3n1 - NVMe3
  3. Remove the NVMe storage device path. Prepare the NVMe drive for removal.
    1. Use the umount command to unmount any file systems that are mounted on the device.
      In Linux, NVMe drives do not use the standard block device labeling, such as /dev/sd*. For example, NVMe drive 0 that has a single namespace block device would be /dev/nvme0n1. If you formatted and partitioned that namespace with a single partition, that would be /dev/nvme0n1p1.
    2.  Remove the device from any multiple device (md) and Logical Volume Manager (LVM) volume using it. If the device is a member of an LVM Volume group, then it may be necessary to move data off the device using the pvmove command, then use the vgreduce command to remove the physical volume, and (optionally) pvremove to remove the LVM meta data from the disk.
    3. If the device uses multipathing, run multipath -l and note all the paths to the device. Then, remove the multipathed device using the multipath -f device command.
    4. Run the blockdev --flushbufs device command to flush any outstanding I/O on all paths to the device (where device is the /dev entry from step 2c above).
  4. Power off the NVMe slot with the command  "echo 0 > /sys/bus/pci/slots/slot_number/power"
    Where slot_number is the PCIe slot number assigned to the NVMe device slot:
    10 - NVMe0
    11 - NVMe1
    12 - NVMe2
    13 - NVMe3 
    for example to power off the NVMe disk NVMe0 (PCIe slot 10):
    # echo 0 > /sys/bus/pci/slots/10/power
     
  5. Verify that the blue OK to Remove indicator on the NVMe drive is lit.
  6. On the NVMe drive you plan to remove, push the latch release button to open the drive latch.
  7. Grasp the latch and pull the drive out of the drive slot.
  8. Verify that the NVMe drive has been removed. Type  
    # lspci -nnd :0953
    b2:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
    b4:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
    b6:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
     Note that address b8:00.0, which represents PCIe slot 10 and is the drive labeled NVMe0 on the system front panel and the drive powered off is not listed.
  9. After you physically remove an NVMe drive from the server, wait at least 10 seconds before installing a replacement drive.
  10. Align the replacement drive with the drive slot.
  11. Slide the drive into the slot until the drive is fully seated.
  12. Close the drive latch to lock the drive in place. 
  13. To power on the slot for the drive type "echo 1 > /sys/bus/pci/slots/slot_number/power" Where slot_number is the PCIe slot number assigned to the NVMe device slot (see step 4 above)
    for example to power on the newly installed NVMe disk NVMe0 (PCIe slot 10):
    # echo 1 > /sys/bus/pci/slots/10/power
  14. Confirm that the drive has been enabled and is seen by the system.
    • Check the /var/log/messages log file.
    • List available NVMe devices. Type: "ls -l /dev/nvme*"
    • list the NVMe pci devices:   
      # lspci -nnd :0953
      b2:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
      b4:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
      b6:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)
      b8:00.0 Non-Volatile memory controller [0108]: Intel Corporation Device [8086:0953] (rev 01)

PARTS NOTE:

Refer to the Oracle Server X5-2L/X6-2L Service Manual or System Handbook for part information.

REFERENCE INFORMATION:

Oracle Server X5-2L Service Manual

Oracle System Handbook - Oracle Server X5-2L

Oracle Server X6-2L Service Manual

Oracle System Handbook - Oracle Server X6-2L


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback