Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003736.1
Update Date:2017-01-04
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003736.1 :   Avoiding SCSI Transport Errors while Running Explorer/Extractor/Sccli with Sun Storage 3310 Arrays  


Related Items
  • Sun Storage 3310 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
  •  
  • _Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
205265


Applies to:

Sun Storage 3310 Array - Version Not Applicable and later
All Platforms

Symptoms

  • Bus resets.
  • Degraded performance.
  • ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

When an explorer, extractor, or sccli command is run on a host connected to a Sun Storage 3310 SCSI Array with I/O occurring to the array, in some configurations, we may see a long pause followed by bus resets. Bus resets cause the pending I/Os to be aborted, causing unnecessary command retries which may impact array performance.

The following is an example of the messages:

Dec  6 10:00:08 xyz scsi: [ID 107833 kern.warning] WARNING: /pci@1d,700000/pci@1/scsi@4/sd@0,1 (sd79):
Dec  6 10:00:08 xyz   SCSI transport failed: reason 'reset': retrying command
Dec  6 10:00:11 xyz scsi: [ID 107833 kern.warning] WARNING: /pci@1d,700000/pci@1/scsi@4/sd@0,1 (sd79):
Dec  6 10:00:11 xyz    Error for Command: write(10)    Error Level: Retryable
Dec  6 10:00:11 xyz scsi: [ID 107833 kern.notice]  Requested Block: 310860508                 Error Block: 310860508
Dec  6 10:00:11 xyz scsi: [ID 107833 kern.notice]  Vendor: SUN                                Serial Number: 6215C0B5-00
Dec  6 10:00:11 xyz scsi: [ID 107833 kern.notice]  Sense Key: Unit Attention
Dec  6 10:00:11 xyz scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU:0x0

Cause

Sun[TM] Explorer and Sun StorEdge[TM] 3000 Series Extractor (se3kxtr) run with attached Sun Storage 3310 SCSI Arrays using versions 1.6.2 or lower of the host management software (SUNWsccli), and Firmware revisions 4.11 or lower.

There are two issues associated with the cause:

  • a SAF-TE firmware bug in revisions prior to 1159.

  • a bug in the sscs agent component of the SUNWsccli software.

 

Solution

Ensure customers have the latest firmware and 2.x sccli software installed. 

Note: Please refer to the patch README for a detailed procedure on how to upgrade from 3.x firmware to 4.x if required.

Avoid SCSI transport errors while running explorer/extractor/sccli by ensuring that there are no host channels having target IDs with un-mapped logical units (LUNs):

There are two ways of avoiding these errors.

1. Invoke SUNWsccli (sccli) out of band. This requires specifying the IP address of the
array. For example:

   # sccli ip-address

2. Ensure that there are no host channels defined for target IDs with un-mapped
logical units (LUNs).

As an example, for the SCSI messages shown in symptoms above, the following is the
associated configuration:

  • channels
Ch Type Media Speed Width PID / SID
 --------------------------------------------
 0 Drive SCSI 80M Wide 6 / 7 
 1 Host SCSI 80M Wide 0 / 5 <-- Both Primary/Secondary IDs 
 2 Drive SCSI 80M Wide 6 / 7 
 3 Host SCSI 80M Wide 4 / 3 <-- Both Primary/Secondary IDs 
 6 Drive Unknown 1G Narrow NA / NA 
 7 Host LAN N/A Serial NA / NA

and the lun-maps:

  • lun-maps
Ch Tgt LUN ld/lv ID-Partition Assigned Filter Map
 --------------------------------------------------------------
 1 0 0 ld0 766975E3-00 Primary
 1 0 1 ld1 6215C0B5-00 Primary
 3 4 0 ld0 766975E3-00 Primary
 3 4 1 ld1 6215C0B5-00 Primary

The above lun-maps output shows that only the Primary controller IDs are utilized,
although secondary controller IDs are also specified.

To avoid this problem,  remove (un-configure) the SID for the host channels which will
NOT effect anything else.

Please use following steps to remove the SID from the above configuration.

Schedule a maintenance window and ensure there is no host activity to perform the following as an array reset
is required which will cause I/O disruption.

1. Telnet into the array
2. Choose "view and edit Scsi channels."
3. Select the host channel on which you want to edit the Primary/Secondary ID.
For our example, it would be channel 1.
4. Choose "view and edit scsi Id."
5. Choose the ID 5 (Secondary Controller)
6. Choose "Delete Channel SCSI ID", select "Yes".
7. You will be prompted to reset the array, select "No".
8. Follow steps 2 to 6 for the host channel 3.
9. At this time, when prompted to reset the array, select "Yes".

After the array is reset, the channels will now look as follows:

  • channels
Ch Type Media Speed Width PID / SID
 --------------------------------------------
 0 Drive SCSI 80M Wide 6 / 7
 1 Host SCSI 80M Wide 0 / NA
 2 Drive SCSI 80M Wide 6 / 7
 3 Host SCSI 80M Wide 4 / NA
 6 Drive Unknown 1G Narrow NA / NA
 7 Host LAN N/A Serial NA / NA

 

Note: As mentioned above, this problem is fixed in the 4.13B or later patch of the firmware, and 2.x of the SUNWsccli software. After upgrading to the latest firmware and software, it is recommended to specify the SIDs only when actually utilized in an array configuration.

 



Please see bug ID 5007911, Bug ID 4802207 SCSI Bus Reset and @daemon.error under heavy load test for 3310

transport errors, SE3310, explorer, SeExtractor

Change History: Update and Currency
susan.copeland@oracle.com
Change Date: 10/14/10


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback