![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1457578.1 : Sun Storage 7000 Unified Storage System: When Replacing Faulted Readzilla SSD and/or System Disks in the head unit the replacement is not recognized
In this Document
Created from <SR 3-5635188061> Applies to:Sun Storage 7410 Unified Storage System - Version All Versions and laterSun Storage 7210 Unified Storage System - Version All Versions and later Sun Storage 7310 Unified Storage System - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsWhen replacing a read cache SSD (readzilla) or a system disk in the head unit of a 7000 series ZFS Storage Appliance, the new disk may not be recognised and it might appear as if the new SSD is broken too.
ChangesReplacing a faulted system disk or "readzilla" read cache SSD can trigger the problem.
CauseThis is caused by a known problem with the SATA driver used by the SSDs and HDDs in the head unit. The problem leads to the nv_sata port associated with the drive becoming locked up.
Check for the locked nv_sata port with mdb -k: > *nv_statep::walk softstate | ::print nv_ctl_t nvc_port[0] nvc_port[1] | ::print nv_port_t nvp_state
nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0 nvp_state = 0x4c <---- PORT LOCKED IN RESET nvp_state = 0 nvp_state = 0 nvp_state = 0
Also see Bug 16740884 (nv_sata should clear failed state when port activation function is called) > *nv_statep::walk softstate | ::print nv_ctl_t nvc_port[0] nvc_port[1] | ::print nv_port_t nvp_state
fffff6001d32f350 int nvp_state = 0 fffff6001d32f540 int nvp_state = 0 fffff6001cecba90 int nvp_state = 0 fffff6001cecbc80 int nvp_state = 0 fffff6001d48b810 int nvp_state = 0 fffff6001d48ba00 int nvp_state = 0 fffff6001d48bc90 int nvp_state = 0 fffff6001d48be80 int nvp_state = 0 fffff6000f38d3d0 int nvp_state = 0 fffff6000f38d5c0 int nvp_state = 0x24 fffff6001d32e5d0 int nvp_state = 0 fffff6001d32e7c0 int nvp_state = 0 > ::sata_dmsg_dump
[2013 Aug 20 00:17:24:659:487:365] nv_sata4: SATA port 1 error
[2013 Aug 20 00:17:24:659:506:767] nv_sata4: SATA port 1 error [2013 Aug 20 01:17:36:581:334:092] nv_sata4: SATA port 1 error [2013 Aug 20 01:17:36:581:354:739] nv_sata4: SATA port 1 error [2013 Aug 20 01:28:19:063:788:269] nv_sata4: SATA port 1 error [2013 Aug 20 01:28:19:063:809:112] nv_sata4: SATA port 1 error [2013 Aug 20 01:30:09:055:421:116] nv_sata4: SATA port 1 error [2013 Aug 20 01:30:09:055:440:916] nv_sata4: SATA port 1 error [2013 Aug 20 01:34:55:959:395:827] nv_sata4: SATA port 1 error [2013 Aug 20 01:34:55:959:451:522] nv_sata4: SATA port 1 error [2013 Aug 20 01:34:55:959:541:466] nv_sata4: SATA port 1 error [2013 Aug 20 01:34:55:959:636:821] nv_sata4: SATA port 1 error [2013 Aug 20 01:34:55:959:754:796] nv_sata4: SATA port 1 error [2013 Aug 20 01:40:26:799:626:057] nv_sata4: SATA port 1 error [2013 Aug 20 01:40:26:799:647:159] nv_sata4: SATA port 1 error
According to an update in bug 19905716 it's possible to identify which drive has the faulty nv_sata output status. echo '*nv_statep::walk softstate t | ::print nv_ctl_t \
nvc_port[0].nvp_slot->nvslot_v_addr|::grep .>0 | ::eval "<t::print nv_ctl_t \ nvc_port[0].nvp_ctlp |::print struct nv_ctl nvc_ctlr_num;<t::print nv_ctl_t \ nvc_port[0].nvp_port_num;<t::print nv_ctl_t nvc_port[0].nvp_state"' | mdb -k This is for drives on c0. For drives on c1, for example, replace nvc_port[0] with nvc_port[1]. nvc_ctlr_num = 0x1 ====> c1
nvc_port[0].nvp_port_num = 0x2 ====> t2 nvc_port[1].nvp_state = 0x24 ====> port state of c1t2 is 0x24 So if we suspect a specific drive, we can display the status for that one and verify that it is indeed the one with the faulty status.
NOTE: Bug 16740884 is also related with the nv_sata locked in a failed state and requires a reboot to clear. Fix is in Solaris 11.2 and 12. Backport to 2011.1.x code is being work on via Bug 18390843.
Also see Bug 16908038 (nv_sata hangs when there is only one drive configured and power reset occurs)
SolutionWork is ongoing to resolve issues in this area and several bugs have already been fixed. Please ensure that 7x10 systems are running the LATEST Appliance Firmware Release which contains all latest bug fixes. For the latest Appliance Release versions - See Doc ID 2021771.1 - Oracle ZFS Storage Appliance: Software Updates.
The workaround to release the nv_sata port will be to reboot the system. This reboot can be scheduled for a quiet period to minimize impact on production.
Moving the readzillas to empty slots and reconfiguring the pool is no longer advised because of the possibility of inducing a panic - see Document 1504807.1 for details.
Note: During any maintenance and/or troubleshooting activities, please consider 'deactivating' the ASR functionality of the asset - See Doc ID 1508403.1 This will prevent additional Service Requests from being created improperly.
NOTE: The solution procedure outlined in Doc ID 2157494.1 (Oracle ZFS Storage Appliance: Disks or readzillas are not recognised after replacement in the 7x20/ZS3-x/ZS4-x head) may also be applicable in some situations.
Note from TSC Engineer: I had this with 7410 running 2011.04.24.9.0,1-1.46. Reboot works. I also tried to offline/online disk in the shell (zpool offline <POOL> < DEVICE>) without reboot, it worked.
***Checked for relevance on 24-MAY-2018*** References<NOTE:1504807.1> - Sun Storage 7000 Unified Storage System: Failing or moving readzilla SSD devices can lead to a panic<NOTE:1532289.1> - Sun Storage 7000 Unified Storage System: Sufficient installed RAM is required for efficient use/performance of installed Read SSDs (readzilla) <BUG:16097849> - SUNBT7200843 NV_SATA PORT HELD IN RESET CANNOT REPLACE CACHE DISK ON 7410 RUNNIN <NOTE:1506500.1> - Sun Storage 7000 Unified Storage System: Alerts from readzilla cannot be cleared - even after device replacement or reboot. <BUG:16740884> - NV_SATA SHOULD CLEAR FAILED STATE WHEN PORT ACTIVATION FUNCTION IS CALLED <NOTE:1333120.1> - Sun Storage 7000 Unified Storage System: How to add L2ARC cache SSDs (Readzillas) to a pool Attachments This solution has no attachment |
||||||||||||||||||||
|