SuperCluster T5-8 Primary Domain Intermittent Performance Issue: ssh Connections Time Out, Console Connection Very Slow

Asset ID:	1-72-2211410.1
Update Date:	2017-02-27
Keywords:

Solution Type Problem Resolution Sure

Solution 2211410.1 : SuperCluster T5-8 Primary Domain Intermittent Performance Issue: ssh Connections Time Out, Console Connection Very Slow

Applies to:

Oracle SuperCluster T5-8 Hardware - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 Full Rack - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 Half Rack - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster M6-32 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Node initially went to hung state, ssh could not work. System rebooted w/o CRS/RAC/RDBMS running, and problem re-occurs same time each day.
Subsequent console log in sessions via ILOM exhibiting very poor response times intermittently.
If any data collection process is started it exacerbates the problem. E.g. Exachk, explorer, GUDS, etc.

Changes

In this case there were no changes. System was working then "suddenly" started having problems.

Cause

The ROOT CAUSE is / was a slowly / poorly performing LUN in the ldom's boot rpool.

Found by running GUDS first to in /var/tmp then killed it and started it up with -D /tmp. It ran much better to /tmp (memory) than to /var/tmp (root rpool).

From iostat -xcnz:

   extended device statistics
   r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
   0.0 37.0 0.0 1284.9 0.0 0.7 0.0 18.2 0 13 c0t5000CCA01672EECCd0
   0.0 24.0 0.0 123.4 0.0 4.2 0.0 173.3 0 99 c0t5000CCA0167440E8d0 <<<<
   0.0 109.9 0.0 5180.6 0.0 2.2 0.0 20.4 0 36 c0t5000CCA0166BE1FCd0
   cpu
us sy wt id
  0 0 0 100
   extended device statistics
   r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
   0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 0 100 c0t5000CCA0167440E8d0 <<<<
   0.0 16.0 0.0 64.0 0.0 0.1 0.0 4.3 0 7 c0t5000CCA0166BE1FCd0
   cpu
us sy wt id
  0 0 0 100
   extended device statistics
   r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
   0.0 12.0 0.0 861.8 0.0 4.6 0.0 387.1 0 100 c0t5000CCA0167440E8d0 <<<
   0.0 5.0 0.0 20.0 0.0 0.0 0.0 5.8 0 3 c0t5000CCA0166BE1FCd0

o. This was confirmed by the kernel team in core file analysis in that they saw ZFS IO's timing out to the same device.

o. Note the %BUSY and ASVC_T are really high for this LUN when there is almost no IO going on. 12 writes a second is no real load at all, yet this drive can not handle it.

Solution

Drop the disk and replace it with a new one and bring it back in again.

Steps:

o. c0t5000CCA01672EECCd0 is primary mirror
o. c0t5000CCA0167440E8d0 is the seconday BAD/slow performing mirror
o. c0t5000CCA057A0A9BCd0 is new disk that replaced secondary mirror

1. Detach the 'bad' mirror:

# zpool detach rpool c0t5000CCA0167440E8d0s0

2. Replace disk / wait until disk replaced
-> process calls for shut down of SSC node, replacement of disk and reboot of node
-> hot swap not supported on SuperCluster

3. See new disk in /dev/rdsk

# ls -lah /dev/rdsk/c0t5000CCA01672EECCd0s2

4. Use format, select disk and 'format' it.

# format -e

5. Copy vtoc from primary to secondary mirror - makes partition tables match.

# prtvtoc /dev/rdsk/c0t5000CCA01672EECCd0s2 | fmthard -s - /dev/rdsk/c0t5000CCA057A0A9BCd0s2

6. Attach new disk to rpool

# zpool attach rpool c0t5000CCA01672EECCd0s0 c0t5000CCA057A0A9BCd0s0
PRIMARY RPOOL MIRROR SECONDARY RPOOL MIRROR

7. waiting for re-silvering / re-balance to complete.

# zpool status rpool

References

<BUG:25198355> - SSH CONNECTIONS SLOW AND EVENTUALLY TIME OUT INTERMITTENTLY ON NODE 1
<BUG:15654938> - SUNBT6967781 TXG_SYNC_THREAD IS BLOCKING, EVEN THOUGH THERE IS NO I/O ERROR.
<NOTE:2185936.1> - ldm commands on Control Domain hanging on ZFS, customer unable to run Explorer

Attachments

This solution has no attachment