Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1347270.1
Update Date:2018-05-24
Keywords:

Solution Type  Technical Instruction Sure

Solution  1347270.1 :   How to Replace a Sun Fire X4170, X4270 HDD  


Related Items
  • Sun Fire X4170 Server
  •  
  • Sun Fire X4270 Server
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Applies to:

Sun Fire X4270 Server - Version Not Applicable and later
Sun Fire X4170 Server - Version Not Applicable to Not Applicable [Release N/A]
Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 and later
Information in this document applies to any platform.

Goal

How to Replace a Sun Fire X4170, X4270 HDD.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: A Sun Fire X4170, X4270 HDD needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

A data backup is not a pre-requisite but is a wise precaution.
The customer should conduct an orderly software system shutdown.

Then power down the system and disconnect the Power Cord.

 

The BIOS RAID utility or Operating System should be configured by the customer such that the disk drive is free for replacement.
E.g. The disk may be part of BIOS RAID configuration, Volume Management software, mounted file-system, raw partitions, etc.
A data backup is recommended as a precaution and may be required to restore data onto a replaced disk.
Orderly shutdown of applications and OS may be required if the disk being replaced is the boot device and no mirroring is in place. Customer will need to restore/recover from backup

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

1. Identify and confirm the drive needing replacement

There are several ways to check and verify what is the failed drive

a ) LED on the front of the failed drive.

For a hard failure, the LED on the front of the failed drive should have the "Service Action Required" amber LED illuminated/flashing.
It should also have the "OK to Remove" blue LED illuminated/flashing, but may not depending on the nature of the failure mode and when it occurred.
If middle LED is amber, this indicates the drive is faulty


b) From OS or RAID BIOS
If the disk is part of a Hardware Raid configuration, or you need to verify it, we have two possible scenarios based on the two controller model supported on x4170/x4270 to check the disk and LUN status.
If the disk is not part of a Hardware Raid configuration skip to step (2)

- First scenario

SG-XPCIE8SAS-I-Z
8-Port 3Gbps Serial Attached SCSI HBA
LSI chip based controller


VERIFY LSI CONTROLLER STATUS FROM WITHIN SOLARIS (CLI):

You can use the "raidctl" command provided with Solaris
Platforms using Solaris 10 Update 3 or lower will produce similar to the following output when configured under LSI hardware RAID management:

Failed Volume:
# /usr/sbin/raidctl -l

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
------------------------------------------------------
c1t0d0 IM DEGRADED c1t0d0 OK
c1t1d0 FAILED

Note that the “RAID Status” of the volume is “DEGRADED” and the “RAID Disk” “c1t1d0” has a “Disk Status” of “FAILED”.
Platforms using Solaris 10 Update 4 or higher will produce similar to the following output when configured under LSI hardware RAID management:


Failed Volume:
# /usr/sbin/raidctl -l

Controller: 0
Volume:c1t0d0
Disk: 0.0.0
Disk: 0.1.0

In addition to this, specifics for the "Volume" can be listed by executing the command as follows:
# /usr/sbin/raidctl -l c1t0d0

Volume Sub Size Stripe Status Cache RAID
Disk Size Level
----------------------------------------------------------------
c1t0d0 68.3G N/A DEGRADED N/A RAID1
0.0.0 68.3G GOOD
0.1.0 68.3G FAILED
Note that the “Status” of the “Volume” “c1t0d0” is “DEGRADED” and the “Sub Disk” “0.1.0” has a “Status” of “FAILED”.

VERIFY LSI CONTROLLER STATUS FROM WITHIN SOLARIS, WINDOWS, LINUX AND OTHER SUPPORTED OSs (GUI):

You can use the "MegaRAID Storage Management (MSM)" GUI.
This is not bundled in Solaris but an optional component that CU needs to install in case of need.
The GUI is easy to use, please refer to the "Sun LSI 106x RAID User’s Guide (820-4933)" for installation and usage.


VERIFY LSI CONTROLLER STATUS FROM WITHIN BIOS (outage required to reboot the server):

To verify disk failures within a RAID array on an LSI based controller, the user can view the LSI BIOS at boot time.
The utility may tell the user that the RAID array is in a degraded state. Output may vary depending on firmware level and configuration:

OUTPUT AT BOOT TIME WITH LSI 1064/1068 CONTROLLER:

LSI Logic Corp. MPT SAS BIOS
MPTBIOS-6.02.00.00 (2005.07.08)
Copyright 2000-2005 LSI Logic Corp.
SLOT ID LUN VENDOR PRODUCT REVISION CAPACITY
---- --- --- -------- ---------------- ---------- ----------
0 0 0 LSILOGIC Logical Volume 3000 69618 MB
0 LSILogic SAS1064-IR 1.04.00.00


During BIOS boot, watch for LSI messages prompting you to press <CTRL> <C> to enter the configuration utility.
Press <CTRL> <C> to enter the LSI Logic MPT Setup Utility and select "Raid Properties":

LSI Logic MPT Setup Utility v6.02.00.00 (2005.07.08)
View Array -- LSI
Array 1 of 1
Identifier LSILOGICLogical Volume 3000
Type IM
Scan Order 2
Size(MB) 69618
Status Degraded
Manage Array
Scan Device Identifier RAID Hot Drive Pred Size
ID Disk Spr Status Fail (MB)
0 FUJITSU MAV2073RCSUN72G 0301 Yes No Primary No 69618
1 FUJITSU MAV2073RCSUN72G 0301 Yes No Failed No
Note that the “Status” of the array is “Degraded” and disk “ID” “1” has a “Drive Status” of “Failed”

 

- Second scenario

SGXPCIESAS-R-INT-Z
8-Port 3Gbps SAS RAID HBA
STK SAS Raid HBA - Intel/Adaptec chip based controller

VERIFY STK RAID CONTROLLER STATUS USING THE "arcconf" CLI (available for most supported OSs)

You can use the "arcconf" command. In case of need, CU will probably have to install some additional packages because it is not provided by default with the OS.
Please refer to "Sun StorageTek RAID Manager Software User’s Guide (820-1177)" for software installation and usage
If installed "arcconf" is usually located in "/opt/StorMan/"

 

Example 1 - RAID5 configuration with failed disk in slot 2

# /usr/StorMan/arcconf GETCONFIG 1 LD

Controllers found: 1
----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : Raid
RAID level : 5
Status of logical device : Degraded
Size : 857078 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery/ZMM
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled

--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,0) 000942714WKP 3SE14WKP
Segment 1 : Present (0,1) 00094270GW7S 3SE0GW7S
Segment 2 : Inconsistent (0,2)
Segment 3 : Present (0,3) 000942714YQX 3SE14YQX

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : Raid
RAID level : 5
Status of logical device : Degraded
Size : 857078 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back) when protected by battery/ZMM
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,0) 000942714WKP 3SE14WKP
Segment 1 : Present (0,1) 00094270GW7S 3SE0GW7S
Segment 2 : Inconsistent (0,2)
Segment 3 : Present (0,3) 000942714YQX 3SE14YQX


----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Enclosure 0, Slot 0
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST930003SSUN300G
Firmware : 0868
Serial number : 000942714WKP 3SE14WKP
World-wide name : 5000C500175875FC
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,1(1:0)
Reported Location : Enclosure 0, Slot 1
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST930003SSUN300G
Firmware : 0868
Serial number : 00094270GW7S 3SE0GW7S
World-wide name : 5000C500175871DC
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Device #2
Device is a Hard drive
State : Failed
Supported : Yes
Reported Channel,Device(T:L) : 0,2(2:0)
Reported Location : Connector 0, Device 2
Vendor : *MISSING*
Model :
Firmware :
Size : 0 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,3(3:0)
Reported Location : Enclosure 0, Slot 3
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST930003SSUN300G
Firmware : 0868
Serial number : 000942714YQX 3SE14YQX
World-wide name : 5000C5001759712C
Size : 286102 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings : 0

 

Example 2 - RAID1 (mirror) configuration with failed drive in slot 1

# /opt/StorMan/arcconf GETCONFIG 1 AL
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Sun STK RAID INT
Controller Serial Number : 00817AA1900
Physical Slot : 0
Temperature : 92 C/ 197 F (Abnormal)
Installed memory : 256 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Defunct disk drive count : 1
Logical devices/Failed/Degraded : 1/0/1
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (15583)
Firmware : 5.2-0 (15583)
Driver : 2.2-4 (1)
Boot Flash : 5.2-0 (15583)
--------------------------------------------------------
Controller Battery Information
--------------------------------------------------------
Status : Failed

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : rootdisk
RAID level : 1
Status of logical device : Degraded
Size : 139890 MB
Read-cache mode : Enabled
Write-cache mode : Disabled (write-through)
Write-cache setting : Enabled (write-back) when protected by battery
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,0) 43980AR7 3NM80AR7
Segment 1 : Inconsistent (0,1)


----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,0
Reported Location : Enclosure 0, Slot 0
Reported ESD : 2,0
Vendor : SEAGATE
Model : ST914602SSUN146G
Firmware : 0603
Serial number : 43980AR7 3NM80AR7
World-wide name : 5000C5000EFA0AAC
Size : 140009 MB
Write Cache : Disabled (write-through)
FRU : None
S.M.A.R.T. : No

Device #1
Device is a Hard drive
State : Failed
Supported : Yes
Reported Channel,Device : 0,1
Reported Location : Connector 0, Device 1
Vendor : *MISSING*
Model :
Firmware :
Size : 0 MB
Write Cache : Unknown
FRU : None
S.M.A.R.T. : No

Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device : 0,2
Reported Location : Enclosure 0, Slot 2
Reported ESD : 2,0
[...]
[...]

 


Example 3 - Blink the locate led of a disk to physically identify it

You can use the following command to blink the disk locate led:

#/opt/StorMan/arcconf IDENTIFY

Usage: IDENTIFY <Controller#> LOGICALDRIVE <LogicalDrive#>
Usage: IDENTIFY <Controller#> DEVICE <Channel# ID#>
======================================================

Identifies a logical device or a physical device.

LogicalDrive# : Number of the logical device to be identified
Channel# ID# : The Channel and ID of the physical device to be identified

For the Device #2 of the previous example

#/opt/StorMan/arcconf IDENTIFY 1 DEVICE 0 2
Controllers found: 1
Only devices managed by an enclosure processor may be identified
The specified device is blinking.
Press any key to stop the blinking.

VERIFY STK RAID CONTROLLER STATUS FROM WITHIN SOLARIS, WINDOWS, LINUX AND OTHER SUPPORTED OSs (GUI):

You can use the "Sun StorageTek RAID Manager" GUI.
This is not bundled in the OS but an optional component that CU needs to install in case of need.
The GUI is easy to use, please refer to the "Sun StorageTek RAID Manager Software User’s Guide" for installation and usage.

VERIFY STK RAID CONTROLLER STATUS FROM WITHIN BIOS: (outage required to reboot the server):

To verify disk failures within a RAID array on an Adaptec / Sun STK based controller, the user can view the Adaptec RAID BIOS at boot time.
The utility may tell the user that the RAID array is in a degraded state.
Output may vary depending on firmware level and configuration:

OUTPUT AT BOOT TIME WITH SUN STK ADAPTEC CONTROLLER:
Adaptec RAID BIOS V5.3-0 [Build 16732]
(c) 1998-2008 Adaptec, Inc. All Rights Reserved
Press <Ctrl><A> for Adaptec RAID
Booting the Controller Kernel....Controller started
Controller #00: Sun STK RAID EXT at PCI Slot:02, Bus:04, Dev:00, Func:00
Waiting for Controller to Start....Controller started
Controller monitor V5.3-0 [16732], Controller kernel V5.3-0 [16732]
Battery Backup Unit Present
Controller POST operation successful
Controller Memory Size: 256 MB
Controller Serial Number: SOMESERIALNUM
Controller WWN: SOMEWWNNUMBER
One or more drives are either missing or not responding.
Please check if the drives are connected and powered on.
Press <Enter> to accept the current configuration.
Press <Ctrl-A> to enter Adaptec RAID Configuration Utility.
Press <Ctrl-H> to Pause Configuration Messages.
(Default is not to accept if no valid key pressed in 30 second)
Timeout. BIOS took the default Configuration.
Location Model Rev# Speed Size
----------------------------------------------------------
J3 : Dev 00 ATA SEAGATE ST32500N 3AZQ 3.0G 238.4 GB
-- : -- No device

Note that the “Location” of the device “Dev 01” is missing populated information and the “Model” of the disk is populated as “No device”


It is possible to enter the STK RAID BIOS pressing <Ctrl-A> when prompted to access the controller options.

Select--> "Array Configuration Utility" ---> "Manage Arrays" to get and edit the logical drives status and configuration

 

2. If the disk or LUN is part of a SW RAID configuration, Volume Management SW, ZFS, etc. Operating System should be configured by the customer
such that the disk drive is free for replacement.
See Section "Reference information" below for more details and useful guides.

3. Ensure the correct hard disk drive is being replaced and label if necessary. Press the button on the front to release the spring loaded lever.

4. Grasp the handle and remove the drive from the bay

5. Open the latch of the replacement hard disk drive and insert into the bay

6. Push the drive into the bay until it stops. Close the spring loaded lever handle to engage the hard disk drive to the hard disk backplane

OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

Customer to verify drive availability
Customer to use appropriate software commands to re-activate/re-sync mirror if manual intervention is required

PARTS NOTE:

https://support.oracle.com/handbook_private/Systems/SunFireX4170/components.html#Disks

 

REFERENCE INFORMATION:

Sun Fire X4170, X4270, and X4275 Servers Service Manual
http://docs.oracle.com/cd/E19477-01/820-5830-13/index.html

SG-XPCIE8SAS-I-Z Controller documentation (LSI)

- Sun LSI 106x RAID User’s Guide, 820-4933
http://docs.oracle.com/cd/E19477-01/820-4933-15/index.html

- Document 1013107.1 - How to Identify BIOS and Solaris[TM] Hardware RAID Status

- Document 1513610.1 - MegaCli and sas2ircu - utility to manage Internal Raid HBA (LSI-Niwot /Erie)


SGXPCIESAS-R-INT-Z Controller documentation (STK SAS HBA Intel/Adaptec)

- Sun StorageTek RAID Manager Software Users Guide, 820-1177
http://download.oracle.com/docs/cd/E19121-01/sf.x4150/820-1177-13/820-1177-13.pdf

- Uniform Command-Line Interface User's Guide (arcconf)
http://docs.oracle.com/cd/E19121-01/sf.x4150/820-2145-12/index.html

- Document 1013107.1 - How to Identify BIOS and Solaris[TM] Hardware RAID Status


Detailed SVM instructions can be found at
http://download.oracle.com/docs/cd/E19253-01/816-4520/troubleshoottasks-96/index.html

See Document 1010946.1 for Detailed Veritas Volume Manager instructions.
See Document 1002753.1 for Detailed ZFS instructions

References

<NOTE:1010946.1> - General Guidance for Diagnosis (Disk Failures/Errors) and Replacing Internal Server Disks and JBOD Disks within Solaris
<NOTE:1002753.1> - How to Replace a Drive in Solaris[TM] ZFS

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback