Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1383660.1
Update Date:2018-04-20
Keywords:

Solution Type  Technical Instruction Sure

Solution  1383660.1 :   Sun Fire[TM] Servers (V480, V490, V880, V890):Troubleshooting Fibre Channel Drives  


Related Items
  • Sun Fire V880z Visualization Server
  •  
  • Sun Fire V490 Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire V880 Server
  •  
  • Sun Fire V890 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-Vx80
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
  •  




In this Document
Goal
Solution
 Architecture
 Background
 Single Backplane (Dual Path Shown)
 Dual Backplane (Dual Backplane Split Path Shown)
 Finding the Error
 Finding The Disk
 Identify the Configuration
 Find the Disk Slot
  Volume Manager Considerations
 For Further DPM Troubleshooting Steps if your issue is not solved by the other steps see the references section Below
References


Applies to:

Sun Fire V880z Visualization Server - Version Not Applicable and later
Sun Fire V480 Server - Version Not Applicable and later
Sun Fire V490 Server - Version Not Applicable and later
Sun Fire V880 Server - Version Not Applicable and later
Sun Fire V890 Server - Version Not Applicable and later
Information in this document applies to any platform.

Goal

<p >*Goal
<span >Enter the goal of the document. What does the customer want to accomplish?

Troubleshoot FCAL Disk issues on V480, V490, V880, V890

Solution

  • Architecture

    • Background

Fibre Channel (FC) is a high-performance serial interconnect standard designed for bidirectional, point-to-point communication among servers, storage systems, workstations, switches, and hubs.

FC-AL devices employ a high-performance Gigabit serial interface, which supports multiple standard protocols such as Small Computer Systems Interface (SCSI) and Asynchronous Transfer Mode (ATM).

By supporting these standard protocols, FC-AL preserves any investment in existing legacy systems, firmware, applications, and software.

      • Supports 100-Mbyte per second data transfer rate (200 Mbytes per second with dual porting)
      • Capable of addressing up to 127 devices per loop
      • Provides for reliability, availability, and serviceability (RAS) features such as hot-pluggable and dual-ported disks, redundant data paths, and multiple host connections
      • Supports standard protocols such as IP and SCSI
      • V480, V490 is a very simple implementation with 2 disks on a single Backplane.   The disks can be accessed by 2 paths either the internal controller or through an optional second path from a PCI card. 
      • Internal disks:
        • Disk 1       /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@WWN,0 (cXt0d0 lower disk)
          Disk 2       /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@WWN,0 (cXt1d0 upper disk)
        V880, V890 is a more complex configuration with a single or dual Disk Backplane ( Base Backplane and Expansion Backplane).
        The dual Backplanes can be joined or split, and each can have a single path or dual path.One path of the potentially 4 paths will be the internal controller:
        • Onboard FCAL-loopA    /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@WWN,0
    • Single Backplane (Dual Path Shown)

Detailed Wiring    Single Backplane Loop A         Single Backplane Loop  A  &  B

Single_BP_DualP

    • Dual Backplane (Dual Backplane Split Path Shown)

Detailed Wiring    Dual Backplane Joined Loop A          Dual Backplane Joined Loop    A  &  B

Dual_BP_DualP

 

 

Notes
  • The Ports on the HBAs above with the red "X" are not to be used for external devices.  It is unsupported to control Internals Disks and External Arrays with the same FC Port.
  • The X6727A 375-3030 Crystal+ is the only PCI to FC-AL adapter that may be used for connectivity to an internal disk backplane.

 

 

    • Finding the Error

1. Check the messages files to determine the paths that have the problem

 
2. Find the paths attached to the internal drives
# luxadm probe -p
Found Enclosure:
SUNWGS INT FCBPL Name:FCloop Node WWN:5080020000181ab8
Logical Path:/dev/es/ses0
Physical Path:/devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ses@w5080020000181ab9,0:0
 

You can have multiple paths to the internal drives such as /dev/es/ses0 and /dev/es/ses1. 
Several of options of the luxadm commands support running these commands to different ses devices.
For instance, ses0 is logically associated with FCloop. If you have another path to the internal drives you will have a different ses instance for that path. 
Counters are maintained on a per port.

 
3. Display the loop - check for general status information.  Login failed means that a drive is not present.
# luxadm display FCloop
SUNWGS INT FCBPL
DISK STATUS
SLOT DISKS (Node WWN)
0 On (O.K.) 500000e010137110
1 On (O.K.) 500000e01012e2e0
2 On (O.K.) 500000e01013a570
3 On (O.K.) 500000e0101301a0
4 On (O.K.) 500000e01013a600
5 On (O.K.) 500000e01012e4f0
6 On (Login failed)
7 On (Login failed)
8 On (Login failed)
9 On (Login failed)
10 On (Login failed)
11 On (Login failed)
SUBSYSTEM STATUS
FW Revision:9226 Box ID:0
Node WWN:5080020000181ab8 Enclosure Name:FCloop
SSC100's - 0=Base Bkpln, 1=Base LoopB, 2=Exp Bkpln, 3=Exp LoopB
SSC100 #0: O.K.(9226/ FD99)
SSC100 #1: O.K.(9226/ FD99)
SSC100 #2: Not Installed
SSC100 #3: Not Installed
Temperature Sensors - 0 Base, 1 Expansion
0 :22?C
1 Not Installed
Default Language is USA English, ASCII
 

 
4. Dump the loop map of the loop in question.  This gives the position on the loop each device is on.  
Also in this case you can dump each instance of the ses devices by giving dump_map the ses instance.
(luxadm  -e dump_map /dev/es/ses0, luxadm -e dump_map /dev/es/ses1)
# luxadm -e dump_map FCloop
Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
0 e0 5 e0 500000e01012e4f1 500000e01012e4f0 0x0 (Disk device)
1 e4 2 e4 500000e01013a571 500000e01013a570 0x0 (Disk device)
2 e2 3 e2 500000e0101301a1 500000e0101301a0 0x0 (Disk device)
3 e8 1 e8 500000e01012e2e1 500000e01012e2e0 0x0 (Disk device)
4 ef 0 ef 500000e010137111 500000e010137110 0x0 (Disk device)
5 e1 4 e1 500000e01013a601 500000e01013a600 0x0 (Disk device)
6 dc 6 dc 5080020000181ab9 5080020000181ab8 0xd (SES device)
7 1 7d 0 210000e08b000000 200000e08b000000 0x1f (Unknown Type,Host Bus Adapter)
 
 
 
5. Dump the Link Error Status Block - This is where it gets tricky.  CRC errors are a good indication that there is a loop problem, usually from an upstream member.
Every counter needs to be accessed at in an incremental fashion since these are totals.  
It would be good to take a snapshot of the counters then take another snapshot after you see the problem you are experiencing.
Then compare the two.  CRC errors are indicators of a problem on the loop, but do not provide a reliable path to the bad hardware in all cases.  
High numbers of sync loss & invalid word don't necessarily mean there is a problem.
Some positions typically have higher "sync loss" than other ones.

luxadm -e rdls FCloop
Link Error Status information for loop:
al_pa   lnk fail    sync loss   signal loss   sequence err   invalid word   CRC
e0      0           7           0             0              28             0
e4      0           8           0             0              32             0
e2      8           662         0             0              354713         0
e8      0           10          0             0              40             0
ef      0           1019188     0             0              3945682        0
e1      0           6           0             0              24             0
dc      0           0           0             0              10             0
1       0           1           1             0              0              0
 

NOTE: These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.
    • Finding The Disk

    • Identify the Configuration

The type of configuration can be identified in an explorer using various commands in the disk sub-directory in explorer.   ( explorer.838c4565.v880e-2012.01.17.21.45/disks ).  

The first step is to locate all the files that the erring disk is present in.   `grep -l <wwn>  *` in the disks sub-directory can be used.   WWN is the World Wide Number for the disk. 

Usually, the last 4 or 5 digits is enough to have a unique WWN

 

$  grep -l df73 *
format.out   
ls-l_dev_rdsk.out
ls_-lAR_@dev_@devices.out

<Deleted>

luxadm_-e_dump_map_-devices-pci@8,600000-SUNW,qlc@2-fp@0,0:devctl.out
luxadm_-e_dump_map_-devices-pci@9,600000-pci@1-SUNW,qlc@4-fp@0,0:devctl.out
luxadm_display_50800200001f6978.out

 

In the example above df73 shows up in several places.   It shows up in two different `luxadm -e dump_map` commands.  The dump_map looks at a device path and displays all FC devices on the path.

The disk WWN showing up in 2 dump_map files means the disk is dual pathed.   Notice that one of the paths is /pci@8,600000/SUNW,qlc@2/fp@0,0 which we know is the internal controller on the V880/V890. 

The other path is a PCI Card in Slot 8.The other way to see the number or paths is to look at the `luxadm display <enclosure>'.   In this case, the enclosure is 50800200001f6978 which is the node WWN. 

Each Backplane is treated as an individual enclosure in split Backplane configurations.  In Joined backplane configurations, the backplanes are treated as one enclosure.

Other common names for the enclosure/backplane on the V880/V890 are FCLoop and FCloop2.  FCloop2  is usually the name given to the Expansion Backplane in a split configuration. 

The name can be set with the luxadm command, so the enclosure name is not confirmation of a certain configuration.

 

Another way to determine the configuration type is by looking at the `luxadm display <enclosure>` output:

# luxadm display FCloop2
SUNWGS INT FCBPL
DISK STATUS
SLOT DISKS (Node WWN)
0 Not Installed
1 Not Installed
2 Not Installed
3 Not Installed
4 Not Installed
5 Not Installed
6 On (O.K.) 20000004cf4cfd9b
7 On (O.K.) 200000203790f8d0
8 On (O.K.) 200000203790f8bb
9 On (O.K.) 20000004cf6206a8
10 On (O.K.) 2000002037e34cd1
11 On (O.K.) 20000004cf620859
SUBSYSTEM STATUS
FW Revision:922A Box ID:0
Node WWN:50800200001cb798 Enclosure Name:FCloop2
SSC100’s - 0=Base Bkpln, 1=Base LoopB,
           2=Exp Bkpln, 3=Exp LoopB
SSC100 #0: Not Installed
SSC100 #1: Not Installed
SSC100 #2: O.K.(922A/ 8D3C)
SSC100 #3: O.K.(922A/ 8D3C)
Temperature Sensors - 0 Base, 1 Expansion
0:23ºC
1Not Installed
Backplanes - A=Base, B=Expansion
A: Not Installed
B: O.K.
Default Language is USA English, ASCII]

Notice in the output above because we are looking at FCloop2 we are most likely looking at the expansion Backplane. 

Notice further down in the output the various loops and Backplanes are listed.   #2 and #3 are the indicators for the Expansion Backplane. 

They both show a status of O.K.  The Base Backplane shows as not installed because the HBA that is accessing this enclosure cannot see the Base Backplane because the backplanes are Split.  

In this example we have a dual backplane and the backplanes are Joined, and the disks are controller by 2 controllers Loop A & B:

# luxadm display 50800200001f6978

                SUNWGS INT FCBPL
                 DISK STATUS
SLOT   DISKS             (Node WWN)
0      On (O.K.)         20000011c6f0c79c
1      On (O.K.)         20000011c6f0caf3
2      On (O.K.)         20000011c6f0db5a
3      On (O.K.)         500000e01147d320
4      On (O.K.)         20000011c6f0d5a5
5      On (O.K.)         20000011c6f0ca78
6      On (O.K.)         500000e01145e3f0
7      On (O.K.)         500000e011430b20
8      On (O.K.)         500000e01145bcb0
9      On (O.K.)         500000e01145d3a0
10     On (O.K.)         20000011c6f0cb40
11     On (O.K.)         20000011c6f0df73
                SUBSYSTEM STATUS
FW Revision:922A   Box ID:0
  Node WWN:50800200001f6978   Enclosure Name:FCloop
SSC100's - 0=Base Bkpln, 1=Base LoopB, 2=Exp Bkpln, 3=Exp LoopB
    SSC100 #0:    O.K.(922A/ 8D3C)
    SSC100 #1:    O.K.(922A/ 8D3C)
    SSC100 #2:    O.K.(922A/ 8D3C)
    SSC100 #3:    O.K.(922A/ 8D3C)
          Temperature Sensors - 0 Base, 1 Expansion
          0:26ºC 1:27ºC  (All temperatures are NORMAL.)
Backplanes - A=Base, B=Expansion
        A: O.K.
        B: O.K.
Default Language is USA English, ASCII]

First thing to look at is all 12 disks are present.  This means the backplanes are joined.  Notice each backplane is connected to Loop A & B.  

#0, #1, #2, #3  show as O.K.  We also see near the bottom that Backplane  A  & B  are O.K.

This drawing has details of how this configuration is wired.   Dual backplane Loop    A  &  B

    • Find the Disk Slot

After the configuration type is determined to find which slot the disk is installed is the next step to replacing the disk. 

In all of the output above with 'luxadm display <enclosure>' the Disk SLOT number is the slot on the V880/V890. 

Often the 'luxadm display <enclosure>'  is missing from explorer, or not in the data sent from the customer.   The slot can also be determined from the `luxadm -e dump_map` command.

# luxadm -e dump_map -devices pci@8,600000-SUNW,qlc@2-fp@0,0:devctl
Pos AL_PA ID Hard_Addr Port WWN         Node WWN         Type
0     1   7d    0      21000003ba8c4565 20000003ba8c4565 0x1f (Unknown Type,Host Bus Adapter)
1     ef  0     ef     21000011c6f0c79c 20000011c6f0c79c 0x0  (Disk device)
2     e8  1     e8     21000011c6f0caf3 20000011c6f0caf3 0x0  (Disk device)
3     e4  2     e4     21000011c6f0db5a 20000011c6f0db5a 0x0  (Disk device)
4     dc  6     dc     50800200001f6979 50800200001f6978 0xd  (SES device)
5     e2  3     e2     500000e01147d321 500000e01147d320 0x0  (Disk device)
6     e1  4     e1     21000011c6f0d5a5 20000011c6f0d5a5 0x0  (Disk device)
7     e0  5     e0     21000011c6f0ca78 20000011c6f0ca78 0x0  (Disk device)
8     d9  8     d9     500000e01145e3f1 500000e01145e3f0 0x0  (Disk device)
9     d6  9     d6     500000e011430b21 500000e011430b20 0x0  (Disk device)
10    d5  a     d5     500000e01145bcb1 500000e01145bcb0 0x0  (Disk device)
11    d4  b     d4     500000e01145d3a1 500000e01145d3a0 0x0  (Disk device)
12    d3  c     d3     21000011c6f0cb40 20000011c6f0cb40 0x0  (Disk device)
13    d2  d     d2     21000011c6f0df73 20000011c6f0df73 0x0  (Disk device)

 

 

AL_PA 

 In the above luxadm output the AL_PA of the Disk Node WWN can be compared to the AL_PA to Slot Chart to determine the disk Slot.  

For Example, Disk WWN 20000011c6f0cb40 has an AL_PA of d3.   AL_PA of d3 is associated with Disk Slot 10.

 

The disk target as it appears in format can be useful as well in some situations.  With a Dual Backplane Joined configuration the disks are numbered in the chassis:

Disk0 cXt0d0           Disk6 cXt8d0
Disk1 cXt1d0           Disk7 cXt9d0
Disk2 cXt2d0           Disk8 cXt10d0
Disk3 cXt3d0           Disk9 cXt11d0
Disk4 cXt4d0           Disk10 cXt12d0
Disk5 cXt5d0           Disk11 cXt13d0

Notice that the targets go from t0 to t5, then t8 to t13.   If format shows a similar target layout, it is a good assumption that the disks are V880/V890 Internal FC drives. 

With a Split configuration, it is not as simple to determine. The Expansion Backplane is seen as a second Enclosure, and the devices will target starting at t0 again:

Disk0 cXt0d0           Disk6 cYt0d0
Disk1 cXt1d0           Disk7 cYt1d0
Disk2 cXt2d0           Disk8 cYt2d0
Disk3 cXt3d0           Disk9 cYt3d0
Disk4 cXt4d0           Disk10 cYt4d0
Disk5 cXt5d0           Disk11 cYt5d0
    • Note on V480/V490:   The AL_PA for Disk 0 and Disk 1 are the same as for the V880/V890  ( EF & E8 ).   The Target for V480 and V490 are always the Disk slot Number.
    •  Volume Manager Considerations

 Solaris Volume Manager (SVM) and Veritas Volume Manager (VXVM) provide a means to control and utilize the two paths to the disk as one device. 

SVM uses MPXIO, while VXVM uses Dynamic Multi-Pathing (DMP).

Customers are strongly encouraged to use one of these technologies when using two paths to access an internal disk on V480/V490/V880/V890.

With both technologies, the Systems Administrator configures his application to access a meta-device which is aware of both paths to the disk and can load balance, and failover as required.

 

From a troubleshooting standpoint, this gives another source of errors that will need to be examined.  The errors will often call out the meta-device, the physical device or both. 

 

 The device in VXVM will look very similar to a normal Solaris cXtXdX number.  For example /dev/vx/dmp/c1t0d0.   The path contains a different basename than the usual Solaris disk. 

The path contains vx/dmp.   Veritas uses the first disk as the meta-device name so c1t0d0 is the equivalent to c1t0d0 with potentially another path to the disk. 

The `vxdisk list <disk>' command can be used to see both paths.

 

# vxdisk list c1t0d0s2
Device:    c1t0d0s2
devicetag: c1t0d0
type:      auto
hostid:    ssprd100
disk:      name=rootdisk id=1202852640.19.ssprd100
group:     name=rootdg2 id=1202852643.21.ssprd100
info:      format=sliced,privoffset=1,pubslice=4,privslice=3
flags:     online ready private autoconfig autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t0d0s4 char=/dev/vx/rdmp/c1t0d0s4
privpaths: block=/dev/vx/dmp/c1t0d0s3 char=/dev/vx/rdmp/c1t0d0s3
guid:      -
udid:      SEAGATE%5FST314670FSUN146G%5FDISKS%5F30353234343147445A360000
site:      -
version:   2.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=4 offset=1 len=286596863 disk_offset=0
private:   slice=3 offset=1 len=101759 disk_offset=286596864
update:    time=1302745617 seqno=0.52
ssb:       actual_seqno=0.0
headers:   0 248
configs:   count=1 len=75084
logs:      count=1 len=11376
Defined regions:
 config   priv 000017-000247[000231]: copy=01 offset=000000 enabled
 config   priv 000249-075101[074853]: copy=01 offset=000231 enabled
 log      priv 075102-086477[011376]: copy=01 offset=000000 enabled
Multipathing information:
numpaths:   2
c1t0d0s2        state=enabled
c2t0d0s2        state=enabled

 

Near the bottom, it can be seen that there are 2 paths to this disk, one on c1 and one on c2.   The disks both show up in format as they would normally.   From format you can get the complete device path. 

The luxadm  command behaves as normal with VXVM and DMP.

Solaris Volume Manager behaves differently with respect to device paths when enabled.  SVM removes the underlying devices in /dev/dsk and leaves a single meta-device.  

This meta-device can be recognized by the scsi_vhci in the device path.

 

# format

AVAILABLE DISK SELECTIONS:
       0. c8t600C0FF00000000008597D2C927DD500d0 <SUN-StorEdge3510-327R cyl 35010 alt 2 hd 64 sec 255>
          /scsi_vhci/ssd@g600c0ff00000000008597d2c927dd500
       1. c8t600C0FF00000000008597D74093F6100d0 <SUN-StorEdge3510-327R cyl 43763 alt 2 hd 64 sec 255>
          /scsi_vhci/ssd@g600c0ff00000000008597d74093f6100
       2. c8t20000004CF98CE56d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /scsi_vhci/ssd@g20000004cf98ce56
       3. c8t20000000871D9B6Ad0 <HITACHI-DK32EJ72FSUN72G-2Q0F cyl 14087 alt 2 hd 24 sec 424>
          /scsi_vhci/ssd@g20000000871d9b6a
       4. c8t20000000871D93ACd0 <HITACHI-DK32EJ72FSUN72G-2Q0F cyl 14087 alt 2 hd 24 sec 424>
          /scsi_vhci/ssd@g20000000871d93ac
       5. c8t20000000871DBE5Cd0 <HITACHI-DK32EJ72FSUN72G-2Q0F cyl 14087 alt 2 hd 24 sec 424>
          /scsi_vhci/ssd@g20000000871dbe5c
       6. c8t20000000871DBE9Fd0 <HITACHI-DK32EJ72FSUN72G-2Q0F cyl 14087 alt 2 hd 24 sec 424>
          /scsi_vhci/ssd@g20000000871dbe9f


- Deleted Text -


 

The other noticeable difference is the very long target number 20000000871D9B6A.  20000000871D9B6A is the Disk WWN. 

It should be noted that the hexadecimal numbers in the WWN are all converted to upper case to differentiate the d0 in the cXtXdX from being part of the WWN.  

That WWN can then be located in the various luxadm outputs to determine the physical disk location.

For additional DMP and MPXIO troubleshooting see the References section Below.

 

 

  • Common Issues
    • # format" showing high target ID's assigned to drives & OBP Message - "WARNING: Internal FCAL selectable ID's have been reprogrammed with value 0  HELP Problem and Error Message:


After installing an Expansion BP the Solaris OS command "format" printed out the following (Showing high target ID's assigned to drives in the Expansion BP) and a "warning" message from OBP.

# format

0. c1t0d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf6208d9,0
1. c1t1d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf6219e8,0
2. c1t2d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf620c9f,0
3. c1t3d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf620ce2,0
4. c1t4d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf620797,0
5. c1t5d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf62176e,0
6. c1t21000004CF275DA6d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275da6,0
7. c1t21000004CF275E08d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275e08,0
8. c1t21000004CF275E73d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275e73,0
9. c1t21000004CF275E82d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275e82,0
10. c1t21000004CF275F11d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275f11,0
11. c1t21000004CF275948d0 /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cf275948,0
12. c4t0d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf6208d9,0
13. c4t1d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf6219e8,0
14. c4t2d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf620c9f,0
15. c4t3d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf620ce2,0
16. c4t4d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf620797,0
17. c4t5d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf62176e,0
18. c4t22000004CF275DA6d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275da6,0
19. c4t22000004CF275E08d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275e08,0
20. c4t22000004CF275E73d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275e73,0
21. c4t22000004CF275E82d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275e82,0
22. c4t22000004CF275F11d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275f11,0
23. c4t22000004CF275948d0 /pci@9,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w22000004cf275948,0


OBP Message:

WARNING: Internal FCAL selectable ID's have been reprogrammed with value 0
You MUST power-cycle the entire FC-AL loop (including this system)
for changes to take effect!




TIP Solution/Workaround

The high target ID's can be a result of the loop(s) not getting initialized correctly. Clean up these devices under Solaris and issue another "boot -r".

If the high targets ID's persist and you're also seeing the above OBP "warning" message, chances are the Expansion BP is not getting identified correctly.

Reboot the system again. If on every reboot you see the OBP "warning" message then you most likely have a bad Expansion BP. Replace Expansion Backplane.

 

    • "Fast Data Access MMU Miss" during OBP/POST and no disk LED's are illuminated in Base BP (but are in Expansion BP).

HELP Problem and Error Message:

"Fast Data Access MMU Miss" during OBP/POST and also during "reset-all".

Checking for disks comes up empty with, "probe-scsi-all" and no disk LED's are illuminated in the Base BP (but are in the Expansion BP).

0>@(#) Daktari POST 4.5.6 2002/01/04 15:54
/dat/fw/work/staff/firmware_re/post/post-build-4.5.6_020104/daktari/integrated (firmware_re)
0>Jump from OBP->POST.
0>CPUs present in system: 0 2
0>First init (low-level [JTAG] hardware init).
0>MFG scrpt mode set to NONE
0>I/O port set to serial TTYA.
0>POST JTAG Init...
0>Probe Ecache
0>Scrub and Setup Ecache
2>Probe Ecache
2>Scrub and Setup Ecache
0>Initializing Scan Database
0>Mask DAR errors off
0>Init MDR DTL
0>Init DAR DTL
0>Init DCS DTL
0>Run POST from Memory
0>Scrub Init Memory
0>Schizo unit 0 init test
0>Schizo unit 1 init test
0>
0>Turn Cheetah 0 errors on
0>Turn Cheetah 2 errors on
0>Turn Module A DCDS errors on
0>Turn DCS errors on
0>Turn DAR errors on
0>INFO:
0> POST Passed all devices.
0>POST: Return to OBP.


CPU0: System Power On Selftest Completed
Pass/Fail Status = 0000.0000.0000.0000
ESB Overall Status = ffff.ffff.ffff.ffff


Fast Data Access MMU Miss
ok


ok reset-all
Resetting ...


Fast Data Access MMU Miss
ok

ok probe-scsi-all
ok





TIP Solution/Workaround

Since the Expansion BP is receiving power, but the Base BP isn't. There is a good chance the PDB is OK, but don't rule it out. We need to check voltages at P16 (which plugs into J0100 of the FCAL Base BP).

1. If all P16 voltages are present: Replace Base Backplane
2. If one or more P16 voltages are missing: Inspect/Reseat "PDU(P15) to Base BP(P16) cable (530-2841)" and recheck P16 voltages.
3. If one or more P16 voltages are missing: Check voltages at P15.
3a. If one or more P15 voltages are missing: Replace PDB
3b. If all P15 voltages are present: Replace cable (530-2841)

 

    • "Fast Data Access MMU Miss" during OBP/POST and no disk LED's are illuminated in the Base and Expansion BP's.

HELP Problem and Error Message:

"Fast Data Access MMU Miss" during OBP/POST and also during "reset-all". Checking for disks comes up empty with, "probe-scsi-all" and no disk LED's are illuminated in the Base and Expansion BP's.

0>@(#) Daktari POST 4.5.6 2002/01/04 15:54
/dat/fw/work/staff/firmware_re/post/post-build-4.5.6_020104/daktari/integrated (firmware_re)
0>Jump from OBP->POST.
0>CPUs present in system: 0 2
0>First init (low-level [JTAG] hardware init).
0>MFG scrpt mode set to NONE
0>I/O port set to serial TTYA.
0>POST JTAG Init...
0>Probe Ecache
0>Scrub and Setup Ecache
2>Probe Ecache
2>Scrub and Setup Ecache
0>Initializing Scan Database
0>Mask DAR errors off
0>Init MDR DTL
0>Init DAR DTL
0>Init DCS DTL
0>Run POST from Memory
0>Scrub Init Memory
0>Schizo unit 0 init test
0>Schizo unit 1 init test
0>
0>Turn Cheetah 0 errors on
0>Turn Cheetah 2 errors on
0>Turn Module A DCDS errors on
0>Turn DCS errors on
0>Turn DAR errors on
0>INFO:
0> POST Passed all devices.
0>POST: Return to OBP.


CPU0: System Power On Selftest Completed
Pass/Fail Status = 0000.0000.0000.0000
ESB Overall Status = ffff.ffff.ffff.ffff


Fast Data Access MMU Miss
ok


ok reset-all
Resetting ...


Fast Data Access MMU Miss
ok

ok probe-scsi-all
ok





TIP Solution/Workaround

Since disk LED's are not illuminated in both the base and expansion backplanes, we need to check power at P15 and P42 of the PDU:

1. If one or more P15/P42 voltages are missing: Replace PDB
2. If all P15/P42voltages are present, then Check voltages at P43 (Expansion BP): Note:If power isn't getting to the Expansion BP
(ie. Bad 530-2863 Cable), can cause both baseplane(Base and Expansion) to have no illuminated drive LED's.
3. If one or more voltages are missing at P43: Inspect/Reseat "PDU(P42) to Expansion BP(P43) cable (530-2863)" and recheck P43 voltages, but if same results: Replace cable (530-2863)
4. If all P43 voltages are present: Chances are one of your FCAL baseplanes causing the problem. Replace Expansion BP first. If still fails; Replace Base BP

    • No disk LED's illuminated in Expansion BP, "backplane 1" errors are present during "reset-all"...

HELP Problem and Error Message:

No visible errors during OBP/POST, no disk LED's illuminated in Expansion BP, "baseplane 1" errors are present during "reset-all", and no disks (just controllers) are seen during a "probe-scsi-all".

{3} ok reset-all
Resetting ...

screen not found.
keyboard not found.
Keyboard not present. Using ttya for input and output.

Sun Fire 880, No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.5, 6144 MB memory installed, Serial #50908687.
Ethernet address 0:3:ba:8:ce:f, Host ID: 8308ce0f.

ERROR: Couldn't read value from FCAL Disk Backplane 1 !
ERROR: Can't set internal FCAL Disk Backplane selectable ID's!
ERROR: Could not set SSC050 TX DIS bit.
{3} ok

{3} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4

/pci@8,600000/SUNW,qlc@2

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007

{3} ok




TIP Solution/Workaround

Inspect/Reseat "I/O Board (P18/J3805)) to Base BP (P19/J0800) cable (530-2840)" and retry, but if same results: Replace cable (530-2840)

 

 

    • No disk LED's illuminated in Expansion BP, "baseplane 0" and "Can't access LM75 device!" errors are present during "reset-all"


HELP Problem and Error Message:

No disk LED's eluminated in Expansion BP, "baseplane 0" and "Can't access LM75 device!" errors are present during "reset-all" & OBP/POST, and high target ID's are seen during a "probe-scsi-all".

screen not found.
keyboard not found.
Keyboard not present. Using ttya for input and output.


Sun Fire 880, No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.5, 6144 MB memory installed, Serial #50908687.
Ethernet address 0:3:ba:8:ce:f, Host ID: 8308ce0f.


ERROR: Couldn't read value from FCAL Disk Backplane 0 !
ERROR: Can't set internal FCAL Disk Backplane selectable ID's!
ERROR: Could not set SSC050 TX DIS bit.
{3} ok
ERROR: Can't access LM75 device!

{3} ok reset-all
Resetting ...



screen not found.
keyboard not found.
Keyboard not present. Using ttya for input and output.


Sun Fire 880, No Keyboard
Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved.
OpenBoot 4.5, 6144 MB memory installed, Serial #50908687.
Ethernet address 0:3:ba:8:ce:f, Host ID: 8308ce0f.


ERROR: Couldn't read value from FCAL Disk Backplane 0 !
ERROR: Can't set internal FCAL Disk Backplane selectable ID's!
ERROR: Could not set SSC050 TX DIS bit.
{3} ok
ERROR: Can't access LM75 device!
ERROR: Can't access LM75 device!

{3} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4
LiD HA LUN --- Port WWN --- ----- Disk description -----

/pci@8,600000/SUNW,qlc@2
ERROR: Can't access LM75 device!
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2100002037f3d422 SEAGATE ST336704FSUN36G 0726
7e 7e 0 7b00000000000000 SUNW SUNWGS INT FCBPL9226

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007

{3} ok
ERROR: Can't access LM75 device!


IP Solution/Workaround

Inspect/Reseat I/O Board (P18/J3805) to Expansion BP (P20/J0800) cable (530-2840) and retry, but if same results: Replace cable (530-2840)

If problem persists replace the fcal baseplane after doing further troubleshooting steps to determine which baseplane is failing. (501-5993) Further DPM Troubleshooting Steps

 

 

    • "probe-scsi-all" is showing a "Loop ID (LiD)" of 7d (125 decimal, highest possible ID on the loop) ....

HELP Problem and Error Message:

"probe-scsi-all" is showing a "Loop ID (LiD)" of 7d (125 decimal, highest possible ID on the loop) and two "Hard Address (HA)" 6's. Neither is valid.

Error Output:

{2} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4
LiD HA LUN --- Port WWN --- ----- Disk description -----
1 0 0 2200002037f3d422 SEAGATE ST336704FSUN36G 0726
7d 6 0 508002000011e432 SUNW SUNWGS INT FCBPL9226
0 0 0 2200002037bd36be SEAGATE ST318304FSUN18G 0726
6 6 0 5080020000183a4a SUNW SUNWGS INT FCBPL9226
5 5 0 2200002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,600000/SUNW,qlc@2
LiD HA LUN --- Port WWN --- ----- Disk description -----
1 0 0 2100002037f3d422 SEAGATE ST336704FSUN36G 0726
7d 6 0 508002000011e431 SUNW SUNWGS INT FCBPL9226
0 0 0 2100002037bd36be SEAGATE ST318304FSUN18G 0726
6 6 0 5080020000183a49 SUNW SUNWGS INT FCBPL9226
5 5 0 2100002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007

{2} ok




Good Output:

{2} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2200002037f3d422 SEAGATE ST336704FSUN36G 0726
6 6 0 508002000011e432 SUNW SUNWGS INT FCBPL9226
8 8 0 2200002037bd36be SEAGATE ST318304FSUN18G 0726
d d 0 2200002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,600000/SUNW,qlc@2
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2100002037f3d422 SEAGATE ST336704FSUN36G 0726
6 6 0 508002000011e431 SUNW SUNWGS INT FCBPL9226
8 8 0 2100002037bd36be SEAGATE ST318304FSUN18G 0726
d d 0 2100002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007

{2} ok



TIP Solution/Workaround

There is one cable that could cause this type of error and it's the FCAL Backplane ID cable, which is part of cable (530-2863) that goes between each FCAL Backplane.

Inspect/Reseat "Base BP (P1) to Expansion BP(P2) cable (530-2863)" and retry, but if same results: Replace cable (530-2863)

 

    •  " Loop is not up - no targets to test."   &  "ISP2200 reported that the loop is down" seen in OBDIAG.

HELP Problem and Error Message:

  ERROR   : Loop is not up - no targets to test.
  DEVICE  : /pci@8,600000/SUNW,qlc@2  <<<<<< Disks are connected to this path only
  SUBTEST : selftest:loop-tests
  CALLERS : (f012cfb8)
  MACHINE : Sun Fire 880
  SERIAL# : 51055597
  DATE    : 08/08/2014 08:02:29  GMT
  CONTROLS: diag-level=max test-args=subtests,verbose

Subtest loop-tests:lip-test

  ERROR   : ISP2200 reported that the loop is down.  <<<<<<<<<<<<<<<<<
  DEVICE  : /pci@8,600000/SUNW,qlc@2
  SUBTEST : selftest:loop-tests:lip-test
  CALLERS : lip-test
  MACHINE : Sun Fire 880
  SERIAL# : 51055597
  DATE    : 08/08/2014 08:02:29  GMT
  CONTROLS: diag-level=max test-args=subtests,verbose

  
TIP Solution/Workaround:

A bad disk is preventing the loop from coming up.  Try removing all disks except one.  Test again.  If still seeing the error the disk you chose may be the bad one.  Try another.

If loop works with that one disk, use process of elimination to locate the faulty disk by adding disks and testing. It is also possible the controller or disk backplane is faulty and causing this error. 

    • No disks showing up under the internal FCAL controller (qlc@2):

HELP Problem and Error Message: No disks showing up under the internal FCAL controller (qlc@2):

{2} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2200002037f3d422 SEAGATE ST336704FSUN36G 0726
6 6 0 508002000011e432 SUNW SUNWGS INT FCBPL9226
8 8 0 2200002037bd36be SEAGATE ST318304FSUN18G 0726
d d 0 2200002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,600000/SUNW,qlc@2

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007

{2} ok




TIP Solution/Workaround

Inspect/Reseat "Motherboard [(Loop A) A/FCAL(IN) & B/FCAL(OUT)] to Base BP [(Loop A) A/J0201 & B/J0200]" cable (530-2623)" and retry, but if same results: Replace cable (530-2623)

If an Expansion BP is present; cable (530-2621) could cause this problem too.

Oddly enough, if the link between "Base BP [(Loop A) E/J0500] & Expansion BP [(Loop A) B/J0200]" were to fail, you would also see the above displayed by "probe-scsi-all".

If the cable (530-2623) does not fix your problem and you have an Expansion BP. Inspect/Reseat cable (530-2621) and retry, but if same results: Replace cable (530-2621)

    • No disks showing up under the internal FCAL controller (qlc@2):

 

HELP Problem and Error Message:

Only Base BP disks showing up under the internal FCAL controller (qlc@2):

{3} ok probe-scsi-all
/pci@9,600000/pci@1/SUNW,qlc@5

/pci@9,600000/pci@1/SUNW,qlc@4
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2200002037f3d422 SEAGATE ST336704FSUN36G 0726
6 6 0 508002000011e432 SUNW SUNWGS INT FCBPL9226
8 8 0 2200002037bd36be SEAGATE ST318304FSUN18G 0726
d d 0 2200002037d9ff56 SEAGATE ST318304FSUN18G 0726

/pci@8,600000/SUNW,qlc@2
LiD HA LUN --- Port WWN --- ----- Disk description -----
0 0 0 2100002037f3d422 SEAGATE ST336704FSUN36G 0726
6 6 0 508002000011e431 SUNW SUNWGS INT FCBPL9226

/pci@8,700000/scsi@1
Target 6
Unit 0 Removable Read Only device TOSHIBA DVD-ROM SD-M14011007






TIP Solution/Workaround

Inspect/Reseat "Base BP [(Loop A) E/J0500 & F/J0501] to Expansion BP [(Loop A) A/J0201 & B/J0200]" cable (530-2621)" and retry, but if same results: Replace cable (530-2621)

 

 

    • For Further DPM Troubleshooting Steps if your issue is not solved by the other steps see the references section Below

 

 KK

 

 

References

<NOTE:1002465.1> - How to Verify the Health of Disk Logical Unit Managed Under Solaris Multipathing Software (MPxIO)
<NOTE:1008827.1> - Using obdiag to troubleshoot internal drives on the Sun Fire(TM)V880 server.
<NOTE:1383378.1> - Sun Fire[TM] Servers (V280R, V480, V490, V880, V890):OBDiag Troubleshooting
<NOTE:1011302.1> - How to verify Dynamic Multipathing (DMP) health
SPLIT BACKPLANE CONFIGURATIONS: https://docs.oracle.com/cd/E19095-01/sfv880.srvr/817-4411-10/817-4411.html

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback