Sun Storage 2500, 2500-M2, and 6000 Arrays: Performance is Slow on an Apparently Optimal Array

Asset ID:	1-72-1674848.1
Update Date:	2018-04-20
Keywords:

Solution Type Problem Resolution Sure

Solution 1674848.1 : Sun Storage 2500, 2500-M2, and 6000 Arrays: Performance is Slow on an Apparently Optimal Array

Applies to:

Sun Storage 2530-M2 Array - Version Not Applicable and later
Sun Storage Flexline 380 Array - Version Not Applicable and later
Sun Storage 6580 Array - Version All Versions and later
Sun Storage 6780 Array - Version All Versions and later
Sun Storage 2510 Array - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

A performance problem has been isolated to a Sun Storage 2500, 2500-M2, or 6000 array. Anything which accesses data on the array experiences an obvious latency, far exceeding normal operation. Monitoring utilities (such as iostat) will report the disks up to 100% busy with high service times. Typical jobs are taking up to 3 or 4 times longer to complete. All evidence clearly points to a problem on the array. However, at first glance, Common Array Manager (CAM) reports no problems related to performance.

There are no failed, expired or charging batteries.
There are no volumes not on preferred path.
There are no degraded or reconstructing volumes.
Common Array Manager (CAM) reports read and write cache active for all volumes.
If there are any alarms, they are unrelated to performance (Such as firmware not at baseline)

Changes

There have been no obvious changes to the array. There may have been some scheduled maintenance, repairs, or resets, but nothing which would identify a true cause and effect.

Cause

The problem is with read and write cache on the array. Even though CAM is reporting read and write cache as active, it is actually inactive. The only way to validate the current cache state in this circumstance is to evaluate the stateCaptureData.dmp file from the supportData. See <Document 1002514.1> Collecting Sun Storage Common Array Manager Support Data for Arrays.

To correctly interpret the file, several acronyms need to be expanded and some terms explained.

RCE - Read Cache Enabled
RCA - Read Cache Active
RCI - Read Cache Inactive
WCE - Write Cache Enabled
WCA - Write Cache Active
WCI - Write Cache Inactive
CME - Cache Mirroring Enabled
CMA - Cache Mirroring Active
CMI - Cache Mirroring Inactive
CWOB - Cache Without Batteries
Dev x - These are the volumes on the array
(Un)owned - The controller which owns (or does not own) the volume.

Once you unzip the supportdata.zip bundle, view the stateCaptureData.dmp file and look for the command ccmShowState. The command should show up twice, as it is run from each controller. In the example output, we can see the actual cache state of each volume. Notice how the A controller reports all cache states active and enabled. The problem is with the B controller. The cache states on B have been internally disabled, thus turning off all write cache for all volumes.

FunctionId: CACHE Function: ccmShowState on controller A
Executing ccmShowState(0,0,0,0,0,0,0,0,0,0):

Controller:            B
# Volumes Mirroring:   9
# Volumes w/ECD:       0
MirrorReady:           Yes
AltMirrorReady:        Yes
BatteryEnabledLocally: Yes
BatteryEnabledByAlt:   Yes
Battery Status:        Okay
Alt Battery Status:    Okay

CacheDeviceFlags:
Dev 0: Unowned Open RCE RCA WCE WCA CME CMA
Dev 1: Owned Open RCE RCA WCE WCA CME CMA
Dev 2: Owned Open RCE RCA WCE WCA CME CMA
Dev 3: Unowned Open RCE RCA WCE WCA CME CMA
Dev 4: Unowned Open RCE RCA WCE WCA CME CMA
Dev 5: Unowned Open RCE RCA WCE WCA CME CMA
Dev 6: Unowned Open RCE RCA WCE WCA CME CMA
Dev 7: Owned Open RCE RCA WCE WCA CME CMA
Dev 8: Owned Open RCE RCA WCE WCA CME CMA

FunctionId: CACHE Function: ccmShowState on controller B
Executing ccmShowState(0,0,0,0,0,0,0,0,0,0):

Controller:            A
# Volumes Mirroring:   9
# Volumes w/ECD:       0
MirrorReady:           Yes
AltMirrorReady:        Yes
BatteryEnabledLocally: Yes
BatteryEnabledByAlt:   Yes
Battery Status:        Okay
Alt Battery Status:    Okay

CacheDeviceFlags:
Dev 0: Owned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 1: Unowned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 2: Unowned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 3: Owned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 4: Owned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 5: Owned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 6: Owned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 7: Unowned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED
Dev 8: Unowned Open RCE RCA WCE WCI CME CMI WC_INTERNALLY_DISABLED CM_INTERNALLY_DISABLED

Solution

The solution to the problem is to reset the controller whose cache states have been internally disabled. This may be done from the CAM GUI or with the service command.

The service command resides in:

     Windows:    C:\Program Files\Sun\Common Array Manager\Component\fms\bin\service.bat
     Solaris:    /opt/SUNWsefms/bin/service
     Linux:      /opt/sun/cam/private/fms/bin/service

From the example above, we would tailor the service command to reset the B controller.

# service -d <deviceid> -c reset -t <a|b|tXctrlY|tXbatY|mel|rls|ddc|soc>

# /opt/SUNWsefms/bin/service -d arrayname -c reset -t b

From the CAM GUI, you would

Select the Array -> Physical Devices -> Controllers -> Scroll to the Correct Controller -> Hit the Reset Controller Button

Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

Attachments

This solution has no attachment