Sun Storage 7000 Unified Storage System: Readzilla goes Offline - nv_sata hangs when there is only One SSD Configured and power reset occurs

Asset ID:	1-72-1624511.1
Update Date:	2018-05-25
Keywords:

Solution Type Problem Resolution Sure

Solution 1624511.1 : Sun Storage 7000 Unified Storage System: Readzilla goes Offline - nv_sata hangs when there is only One SSD Configured and power reset occurs

Applies to:

Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

The history so far shows several nfs outages, performance degradation ending in clients loosing their connections. Customer's workaround is to failover the cluster until symptoms are showing on that head as well.

A failback will then again restore NFS availability for an unknown time.

Customer suspects that this is related to the update performed 4 weeks ago. There are two additional SR's open, one is performance related, the other dealt with a failed cluster failover.

BUI becomes unresponsive or inaccessible, CLI might either still working or not.

The configured readzilla also goes 'offline':

  pool: pool-0
state: ONLINE

NAME STATE READ WRITE CKSUM
pool-0 ONLINE 0 0 0
  mirror-0 ONLINE 0 0 0
  ........
  mirror-9 ONLINE 0 0 0
  c4t5000C5001589DCEBd0 ONLINE 0 0 0 0933QBK00D bay=9
  c4t5000C50015888B82d0 ONLINE 0 0 0 0933QBK00D bay=6
logs
  mirror-10 ONLINE 0 0 0
  c4tATASTECZEUSIOPS018GBYTESSTM0000D1CF0d0 ONLINE 0 0 0 0933QBK00D bay=8
  c4tATASTECZEUSIOPS018GBYTESSTM0000D8FA9d0 ONLINE 0 0 0 0933QBK00D bay=12
cache
  c0t0d0 UNAVAIL 0 0 0 cannot open                <<<<<<<<
spares
  c4t5000C500158813CCd0 AVAIL 0933QBK00D bay=11
  c4t5000C50015863174d0 AVAIL 0933QBK00D bay=10

Customer reported the system was hanging and generated a coredump via NMI.

We have 21 threads waiting for a mutex:

> ::stacks -a -c mutex_vector_enter |::findstack -v !grep mutex_vector_enter
|awk '{print $2}' |sort |uniq -c |sort -nr
16 mutex_vector_enter+0x261(fffff6000f1a06c0)
   3 mutex_vector_enter+0x261(fffff6001cf76500)
   1 mutex_vector_enter+0x261(fffff60096a8aae8)
   1 mutex_vector_enter+0x261(fffff600623ff6a0)

> fffff6000f1a06c0::print mutex_impl_t m_adaptive._m_owner
m_adaptive._m_owner = 0xffffff007a59dc41
> fffff6001cf76500
m_adaptive._m_owner = 0xffffff007acdfc41
> fffff60096a8aae8
m_adaptive._m_owner = 0xfffff6002d1d0841
> fffff600623ff6a0
m_adaptive._m_owner = 0xffffff007af42c41

If we take m_adaptive._m_owner - 1 = address of the thread that holds the mutex.

So, we have 16 threads waiting on this thread. Let's call this thread A.

> ffffff007a59dc40::findstack -v
stack pointer for thread ffffff007a59dc40: ffffff007a59d250
[ ffffff007a59d250 _resume_from_idle+0xf4() ]
ffffff007a59d280 swtch+0x150()
ffffff007a59d2b0 preempt+0xd7()
ffffff007a59d2e0 kpreempt+0x93(1)
ffffff007a59d310 sys_rtt_common+0x140(ffffff007a59d320)
ffffff007a59d320 _sys_rtt_ints_disabled+8()
ffffff007a59d460 mutex_enter+0x10()
ffffff007a59d480 untimeout_default+0x15(16817c4b9, 0)
ffffff007a59d4c0 tcp_timeout_cancel+0x38(fffff600730bc040, fffff600b16e3108)
ffffff007a59d640 tcp_input_data+0x2ea4(fffff600730bc040, fffff6003dd81540, fffff6001db74940, ffffff007a59d690)
ffffff007a59d860 squeue_drain+0x1f8(fffff6001db74940, 4, 1648bb8d0b07)
ffffff007a59d8f0 squeue_enter+0x4fe(fffff6001db74940, fffff600d93e3ba0, fffff600d93e3ba0, 1, 0, 4, 2d)
ffffff007a59d9a0 tcp_input_listener+0xb02(fffff60061727000, fffff6004c090060, fffff6001db74dc0, ffffff007a59d9f0)
ffffff007a59dbc0 squeue_drain+0x1f8(fffff6001db74dc0, 2, 16454f5caf15)
ffffff007a59dc20 squeue_worker+0x132(fffff6001db74dc0)
ffffff007a59dc30 thread_start+8()

We have 3 threads waiting on this thread. Let's call this thread B.

> 0xffffff007acdfc40
stack pointer for thread ffffff007acdfc40: ffffff007acde7a0
[ ffffff007acde7a0 resume_from_intr+0xb7() ]
ffffff007acde7d0 swtch+0x8e()
ffffff007acde880 turnstile_block+0x760(0, 0, fffff600623ff6a0, fffffffffbc07e48, 0, 0)
ffffff007acde8e0 mutex_vector_enter+0x261(fffff600623ff6a0)
ffffff007acde930 ncec_lookup_illgrp+0xa3(fffff6003932a068, ffffff007acde940, fffff600623ff650)
ffffff007acde980 ncec_lookup_illgrp_v4+0x8f(fffff6003932a068, ffffff007acde9e8)
ffffff007acdec20 arp_process_packet+0x444(fffff6003932a068, fffff6010b0f0680)
ffffff007acdec70 arp_rput+0xc8(fffff60035555530, fffff6010b0f0680)
ffffff007acdece0 putnext+0x21e(fffff600355557c0, fffff6010b0f0680)
ffffff007acded50 dld_str_rx_unitdata+0xdd(fffff6003555b990, 0, fffff60046117a00, ffffff007acdedb0)
ffffff007acdee40 i_dls_link_rx+0x2e7(fffff600349eb6c8, 0, fffff60046117a00, 0)
ffffff007acdee80 mac_rx_deliver+0x5d(fffff60037e23650, 0, fffff60046117a00, 0)
ffffff007acdef10 mac_rx_soft_ring_process+0x17a(fffff60037e23650, fffff60038264340, fffff60046117a00, fffff60046117a00, 1, 0)
ffffff007acdf7d0 mac_rx_srs_fanout+0x823(fffff60037fad340, fffff60046117a00)
ffffff007acdf850 mac_rx_srs_drain+0x261(fffff60037fad340, 800)
ffffff007acdf8e0 mac_rx_srs_process+0x180(fffff60021f7aa18, fffff60037fad340, fffff60046117a00, 0)
ffffff007acdf930 mac_rx_classify+0x159(fffff60021f7aa18, fffff60021ed2008, fffff60046117a00)
ffffff007acdf990 mac_rx_flow+0x54(fffff60021f7aa18, fffff60021ed2008, fffff60046117a00)
ffffff007acdf9e0 mac_rx_common+0x1f6(fffff60021f7aa18, fffff60021ed2008, fffff60046117a00)
ffffff007acdfa30 mac_rx+0xac(fffff60021f7aa18, fffff60021ed2008, fffff60046117a00)
ffffff007acdfa70 mac_rx_ring+0x4c(fffff60021f7aa18, fffff60021ed2008, fffff60046117a00, 0)
ffffff007acdfb90 nxge_rx_intr+0x559(fffff60021d74c40, fffff6001e558000)
ffffff007acdfbe0 av_dispatch_autovect+0x7c(3e)
ffffff007acdfc20 dispatch_hardint+0x33(3e, 0)
ffffff007aca9a70 switch_sp_and_call+0x13()
ffffff007aca9ac0 do_interrupt+0xeb(ffffff007aca9ad0, 1)
ffffff007aca9ad0 _interrupt+0xba()
ffffff007aca9bc0 mach_cpu_idle+6()
ffffff007aca9bf0 cpu_idle+0xbe()
ffffff007aca9c20 idle+0x112()
ffffff007aca9c30 thread_start+8()

We have 1 thread waiting on this stack let's call it C.

> 0xfffff6002d1d0840
stack pointer for thread fffff6002d1d0840: ffffff007d5db8b0
[ ffffff007d5db8b0 _resume_from_idle+0xf4() ]
ffffff007d5db8e0 swtch+0x150()
ffffff007d5db990 turnstile_block+0x760(fffff6001b8ddcd8, 0, fffff6000f1a06c0, fffffffffbc07e48, 0, 0)
ffffff007d5db9f0 mutex_vector_enter+0x261(fffff6000f1a06c0)
ffffff007d5dba80 timeout_generic+0x83(1, fffffffff7ab7870, fffff6008e5c5b88, 53d1ac1000, 989680, 0)
ffffff007d5dbab0 timeout+0x5b(fffffffff7ab7870, fffff6008e5c5b88, 8ca0)
ffffff007d5dbb10 mir_timer_start+0xbf(fffff6008e5c5b88, fffff60096a8aa80, 57e40)
ffffff007d5dbb30 mir_svc_idle_start+0x33(fffff6008e5c5b88, fffff60096a8aa80)
ffffff007d5dbb70 mir_svc_release+0x98(fffff6008e5c5b88, 0)
ffffff007d5dbbd0 svc_run+0x1aa(fffff6006141bbe0)
ffffff007d5dbc00 svc_do_run+0x81(1)
ffffff007d5dbec0 nfssys+0x760(e, fe760fbc)
ffffff007d5dbf10 sys_syscall32+0xff()

We have one thread waiting on this stack let's call it D.

> 0xffffff007af42c40
stack pointer for thread ffffff007af42c40: ffffff007af427a0
[ ffffff007af427a0 _resume_from_idle+0xf4() ]
ffffff007af427d0 swtch+0x150()
ffffff007af42880 turnstile_block+0x760(fffff6001b8ddcd8, 0, fffff6000f1a06c0, fffffffffbc07e48, 0, 0)
ffffff007af428e0 mutex_vector_enter+0x261(fffff6000f1a06c0)
ffffff007af42970 timeout_generic+0x83(1, fffffffff7ef3358, fffff600623ff650, 45d964b800, 989680, 0)
ffffff007af429a0 timeout+0x5b(fffffffff7ef3358, fffff600623ff650, 7530)
ffffff007af429d0 nce_start_timer+0x67(fffff600623ff650, 493e0)
ffffff007af42a10 nce_restart_timer+0x5a(fffff600623ff650, 493e0)
ffffff007af42af0 nce_timer+0x25c(fffff600623ff650)
ffffff007af42b30 callout_list_expire+0x77(fffff6000f1a0240, fffff6008eb9d4c0)
ffffff007af42b60 callout_expire+0x31(fffff6000f1a0240)
ffffff007af42b80 callout_execute+0x1e(fffff6000f1a0240)
ffffff007af42c20 taskq_thread+0x248(fffff6001df6a690)
ffffff007af42c30 thread_start+8()

So it seems that B is waiting on D. C & D are waiting on A.

All the stacks seem to be network related. But I am unable to determine what A is waiting on.

It has been confirmed that bugs 15805888, 16748459 and 16697917 represent the same issue with the nv_sata driver on ZFSSA.

The current working hypothesis is that this issue happens if one ZFSSA head runs only one readzilla. Adding a second readzilla drive (or even physically removing all readzillas) may workaround this problem.

The hypothesis also states that this issue may be reproducible by forcing nv_power_reset on the readzilla drive.

    16697917 - Primary head in 7410 cluster panicked 3 times in one day.

    16748459 - 7410 nv_sata port in NV_RESTORE state/cache disk REMOVED (latest 2011.1.5.0 IDR)

    15805888 - nv_sata hangs when there is only one drive configured and power reset occurs

To check the status of the 'nv_sata' ports:

# mdb -k

> *nv_statep::walk softstate | ::print nv_ctl_t nvc_port[0] nvc_port[1] | ::print nv_port_t nvp_state

=> 'Normal status is '0x0'

Cause

Bug 15805888 - nv_sata hangs when there is only one drive configured and power reset occurs

Here's what happens:

nv_sata driven HBA with multiple ports is configured with 1 drive installed
nv_power_reset is triggered on 8th reset retry.
nv_power_reset results in a reset to all ports on nv_sata driven HBA regardless of whether a drive installed in that port.
NV_POWER_RESET|NV_RESET flags are not cleared after ending signature acquisition on ports that do not have drives installed.
Interrupts for empty ports are being handled improperly due to NV_POWER_RESET flag.

The solution is to clear NV_POWER_RESET|NV_RESET states from empty ports after ending signature acquisition.

I reproduced the bug in the lab and tested the fix.

Solution

To upgrade to the 2011.1.8.1 release (or later)

***Checked for relevance on 25-MAY-2018***

References

<BUG:15805888> - NV_SATA HANGS WHEN THERE IS ONLY ONE DRIVE CONFIGURED AND POWER RESET OCCURS

Attachments

This solution has no attachment