Asset ID: |
1-72-1003900.1 |
Update Date: | 2017-05-02 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1003900.1
:
Sun StorEdge 3510 array: Multipathing sometimes fails to provide path redundancy following a controller failure in a dual-controller setup
Related Items |
- Sun Storage 3510 FC Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
|
PreviouslyPublishedAs
205469
Applies to:
Sun Storage 3510 FC Array - Version Not Applicable and later
All Platforms
Symptoms
Occasionally multipath configuration for Sun StorEdge[TM] 3510 (SE3510) with a dual-controller configuration does not work as expected when a single controller failure has taken place.
It is expected that a dual-controller configuration should exhibit controller redundancy during a controller failure. There should not be any disruption to host I/O when dual-path host connections to different RAID I/O module(s) are set up and multipathing has been configured.
Cause
In a SE3510, both IOMs are on the same internal FCAL loop as both controllers, at all times. In other words, each host channel of the top IOM shares a loop with the matching host channel on the bottom IOM.
Each channel of each controller connects to a port bypass circuitry (PBC) and are then connected to each other through the midplane.
In addition, each of the 6 channel ports is mapped to the 3 FC controller chips on each IOM like follows:
+-+ +-+ +-+ +-+ +-+ +-+
|0| |1| |2| |2| |4| |5|
+++ +-+ +-+ +-+ +-+ +-+
|
+-@--------+
| | +---------------------------------------------+
| | | Top RAID IOM |
| | |+-----------+ +-----------+ +-----------+|
| | || ISP-2313 | | ISP-2313 | | ISP-2313 ||
| | || +---+---+ | | +---+---+ | | +---+---+ ||
| | || |CH0|CH1| | | |CH2|CH3| | | |CH4|CH5| ||
| | ++-+-+-+---+-+---+-+---+---+-+---+-+---+---+-++
| | |
| @--------+
|
| @--------+
| | |
| | ++-+-+-+---+-+---+-+---+---+-+---+-+---+---+-++
| | || |CH0|CH1| | | |CH2|CH3| | | |CH4|CH5| ||
| | || +---+---+ | | +---+---+ | | +---+---+ ||
| | || ISP-2313 | | ISP-2313 | | ISP-2313 ||
| | |+-----------+ +-----------+ +-----------+|
| | | Bottom RAID IOM |
| | +---------------------------------------------+
+-@--------+
|
+++ +-+ +-+ +-+ +-+ +-+
|0| |1| |3| |3| |4| |5|
+-+ +-+ +-+ +-+ +-+ +-+
NOTE:
Channels 1, 4 and 5 follow similar architecture as Channel 0.
@ : PBC(Port Bypass Circuit)
+-+
|n| : SFP(Small Form-factor Pluggable)
+-+
If dual host paths are connected to the dual channels on a single FC controller chip (ie. CH0 & CH1 and/or CH4 & CH5), the FC chip fails for any reason, and controller failover does not work as expected; the host connection to the LUNs within the SE3510 will be lost even though the multipathing software was setup correctly.
Solution
To avoid such failures on the SE3510, host cables should be connected to the different ports on different IOM and different FC controller chips. For example, if one host path was connected to the CH0 port on the top IOM, another host path should be connected to the CH4 or CH5 port on the bottom IOM. The following diagram is from Sun StorEdge 3000 Family Best Practices Manual: Sun StorEdge 3510 FC and 3511 SATA Array, demonstrating an example of a best practice dual-controller multipath configuration. There is no single unique best configuration, but the diagram takes into account the issue described in this article.

Internal Comments
Change History
Date: 2006-01-02
User Name: 97961
Action: Approved
Comment: - Tidied up formatting
- Applied trademarking
Version: 9
Date: 2006-01-02
User Name: 97961
Action: Accept
Comment:
Version: 0
Date: 2006-01-02
User Name: 147406
Action: Approved
Comment: Hi,
The author has pointed out that the port bypass circuitry is not what he intends discussing here in this document and that the matter can be explained without referring to the port bypas circuitry.
Agreeing to the above, I am sending this document ahead for further review.
In addition I have pointed out to the author that the sentence "Each drive loop has three PBC chips, each controlling the signal flow to and from four disk drives." is definitely correct and can be found in the Sun StorEdge 3510 troubleshooting guide under the section of "Port Bypass Circuitry"
regards,
sushil
Version: 0
Date: 2005-12-20
User Name: 31844
Action: Approved
Comment: Removed following description from the published document:
Each drive loop has three PBC chips, each controlling the signal flow to and from four disk drives. Channel 0 and 1 share one PBC, channel 2 and 3 share the second while 4 and 5 share the third.
because the subject of this document is not any drive loop issue and
this description is not correct because each channel loop has never been sharing any PBC, but only the ISP.
Please review this correction.
Version: 0
Date: 2005-12-20
User Name: 31844
Action: Update Started
Comment: Since some typo has been pointed out by audience, I'll update this document to correct that.
Version: 0
Date: 2005-11-08
User Name: 25440
Action: Approved
Comment: Publishing.
Version: 6
Date: 2005-11-08
User Name: 25440
Action: Accept
Comment:
Version: 0
Date: 2005-11-07
User Name: 147406
Action: Approved
Comment: Hi,
The recommended changes have been made and reviewed.
Please review this document.
regards,
sushil
Version: 0
Date: 2005-11-06
User Name: 31844
Action: Approved
Comment: I think your content looks good, especially the title is more apt to express the subject.
Version: 0
Date: 2005-11-02
User Name: 147406
Action: Rejected
Comment: Hi Hanada-san,
Let me know if the ffollowing content looks good.
Title: Sun StorEdge[TM] 3510 array: STMS sometimes fails to provide path redundancy following a controller failure in a dual controller set-up.
Problem:
It has been seen that sometimes the dual path STMS configuration for an SE3510 with dual controller configuration did not work as expected when a single controller failure has taken plac
It is expected that a dual controller configuration should exhibit controller redundancy during a controller failure and there should not be any disruption to host I/O when dual path host connections to different RAID I/O modules are set up and STMS has been configured.
Resolution:
In a SE3510, both IOMs are on the same internal FCAL loop as both controllers at all times, ie. each host channel of the top IOM shares a loop with the matching host channel on the bottom IOM.
Each host channel of the top IOM shares a loop with the matching host channel on the bottom IOM.
Each channel of each controller connects to a port bypass circuitry ( PBC) and are then connected to each other through the midplane.
Each drive loop has three PBC chips, each controlling the signal flow to and from four disk drives. Channel 0 and 1 share one PBC, channel 2 and 3 share the second while 4 and 5 share the third.
Each device on the FC is connected to the loop by a port bypass circuit.
Each of the 6 channel ports is mapped to the 3 FC controller chips on each top and bottom IOM like follows:
Diagram....
So,
if the dual host paths are connected to the dual channels on a single FC controller chip, ie. CH0 & CH1 and/or CH4 & CH5
AND
The FC chip fails for any reason
AND
controller failure does not work as expected,
the host connection to the LUNs within this SE3510 will be lost even though the STMS or any other multipathing software was setup correctly.
To avoid such failures on the SE3510, host cables should be connected to the different ports on different IOM and different FC controller chips.
For example, if one host path was connected to the CH0 port on the top IOM, another host path should be connected to the CH4 or CH5 port on the bottom IOM.
Additional Information:
Please refer to document the info doc 77939 for better understanding on internal loop architecture of the SE3510.
The "Sun StorEdge 3000 Family Best Practices Manual, Sun StorEdge 3510 FC Array, Sun StorEdge 3511 SATA Array" manual have the same multipath connections diagram but there is no explicit recommendations of such connections.
Version: 0
Date: 2005-10-24
User Name: 31844
Action: Approved
Comment: Hi Sushil,
Thanks for your suggestions but I think you have something
misunderstandings.
I have nothing to talk about PBC chips within this document but
just talking about ISP-2313 FC controller chips.
The ISP-2313 is not a PBC chip. The PBC chip is an ASIC that is
independent of ISP-2313.
My intention of this document is that ch0 & ch1 pair and ch4 & ch5
pair should not be presented to a dual path host connection use
because each pair is sharing a single ISP-2313 chip, hence the
ISP-2313 chips, not PBC chips, can be an SPOF in such port connection
case.
But I've revised the document considering why this doc was misleading,
so please review again.
Version: 0
Date: 2005-10-24
User Name: 147406
Action: Rejected
Comment: Hi Hanada-san,
Since you intend to talk of the PBC alone and how it can be a SPOF,let us work to make sure that the document only talks of that and does not deviate.
For this reason, the following needs to be cleaned up.
"Some customer has claimed in some escalation case that the dual path STMS configuration for the SE3510 with dual controller configuration did not work as expected against just one RAID I/O Controller Module, that was primary controller in this case, failure even though the customer should setup the STMS as following some manuals properly."
This definitely gives the impression that a single Vs dual is been spoken here.
When you mention,
" AND
- the controller failover was not completed for some reason.
the host connection to the LUNs within this SE3510 will be lost even though the STMS or any other multipathing software was setup correctly."
then I dont quite agree.
Because then the objective of the document which is to ask that the cables should be connected to different PBC's has nothing to do with controller failover mechanism failure.
As it is that sentence does not hold true for single controller arrays. There is no controller failover in single controller setups.
So after you have explained the PBC's, you might want to mention if the cables are connected to Ch0 and ch1 and the PBC chip fails, the luns will be lost.
Ultimately, you might continue mentioning what remedy is to be taken to avoid such situations.
Let me know what you think.
The effort is to keep it clean and concise.
regards,
Sushil
Version: 0
Date: 2005-10-24
User Name: 31844
Action: Approved
Comment: Hi Sushil,
Thanks for your review and comments. Please find my replies inline below:
> Recipient Comment: Hanada-san,
> Could you please explain the objective of this document?
>
> Are you stressing upon the fact that the PBC is a single point of failure when ch0 and ch1 is used for connections?
>
> Or are you stressing upon the possibility of a controller failover mechanism failure in case of a dual controller setup.
I'm not stressing PBC nor controller failover mechanism failure with
this document.
I've just wanted to highlight why ch0 & ch1 should not be used as
dual path connection ports for a single host. Because in such case
a single FC controller chip can be an SPOF.
>
>
> I ask because of the following reasons:
>
> 1. You speak of a single controller stms failover working better than a dual controller setup.
I've never intended to speak of such thing.
Do you mean following description in the Resolution section which
is predicated on a dual controller configuration does not make
sense?
To avoid such LUN lost case on SE3510, dual path should be connected to the different port on the different IOM and the different FC controller chips.
For example, if one host path was connected to the CH0 port on the top IOM, another host path should be connected to the CH4 or CH5 port on the bottom IOM.
>
> A single controller STMS set upcan never provide better failover against a dual controller setup for the following reasons:
>
> 1.If the controller fails in a single controller setup, there is no question of failover for stms.
> In a dual controller setup, atleast the controller failover
> can give some hope.
>
> 2. If the cabling is such that Ch0 and ch1 are involved, then
> in both a single and a dual controller setup, the failure is 100%
>
> 3. If the cable or sfp causes an issue, then in either case , the other path is open for I/O
>
> So from the above, the dual controller setup does have an advantage over the single controller setup by virtue of having a dual cotnroller.
>
> Please do let me know your thoughts.
> So presently I am rejecting this document on the above counts.
>
Version: 0
Date: 2005-10-24
User Name: 31844
Action: Add Comment
Comment: Hi Sushil,
Thanks for your review and comments. Please find my replies inline below:
> Recipient Comment: Hanada-san,
> Could you please explain the objective of this document?
>
> Are you stressing upon the fact that the PBC is a single point of failure when ch0 and ch1 is used for connections?
>
> Or are you stressing upon the possibility of a controller failover mechanism failure in case of a dual controller setup.
I'm not stressing PBC nor controller failover mechanism failure with this document.
I've just wanted to highlight why ch0 & ch1 or ch4 & ch5 should not be used as dual
path connection ports for a single host. Because in such case a single FC controller chip
can be an SPOF.
>
>
> I ask because of the following reasons:
>
> 1. You speak of a single controller stms failover working better than a dual controller setup.
I've never intended to speak of such thing.
Do you mean following description in the Resolution section does not make sense?
To avoid such LUN lost case on SE3510, dual path should be connected to the different port on the different IOM and the different FC controller chips.
For example, if one host path was connected to the CH0 port on the top IOM, another host path should be connected to the CH4 or CH5 port on the bottom IOM.
>
> A single controller STMS set upcan never provide better failover against a dual controller setup for the following reasons:
>
> 1.If the controller fails in a single controller setup, there is no question of failover for stms.
> In a dual controller setup, atleast the controller failover
> can give some hope.
>
> 2. If the cabling is such that Ch0 and ch1 are involved, then
> in both a single and a dual controller setup, the failure is 100%
>
> 3. If the cable or sfp causes an issue, then in either case , the other path is open for I/O
>
> So from the above, the dual controller setup does have an advantage over the single controller setup by virtue of having a dual cotnroller.
>
> Please do let me know your thoughts.
> So presently I am rejecting this document on the above counts.
Version: 0
Date: 2005-10-24
User Name: 147406
Action: Rejected
Comment: Hanada-san,
Could you please explain the objective of this document?
Are you stressing upon the fact that the PBC is a single point of failure when ch0 and ch1 is used for connections?
Or are you stressing upon the possibility of a controller failover mechanism failure in case of a dual controller setup.
I ask because of the following reasons:
1. You speak of a single controller stms failover working better than a dual controller setup.
A single controller STMS set upcan never provide better failover against a dual controller setup for the following reasons:
1.If the controller fails in a single controller setup, there is no question of failover for stms.
In a dual controller setup, atleast the controller failover
can give some hope.
2. If the cabling is such that Ch0 and ch1 are involved, then
in both a single and a dual controller setup, the failure is 100%
3. If the cable or sfp causes an issue, then in either case , the other path is open for I/O
So from the above, the dual controller setup does have an advantage over the single controller setup by virtue of having a dual cotnroller.
Please do let me know your thoughts.
So presently I am rejecting this document on the above counts.
regards,
sushil
Version: 0
Date: 2005-10-23
User Name: 31844
Action: Add Comment
Comment: Certainly the SE3000 Family Best Practice Manual(P/N 816-7325-18) already
has the same DAS cabling diagram in the Chapter 4, but it looks just a sample
cabling and general procedure for creating such configuration, there is no
clear/emphasizing notes/descriptions which will state this is recommended
cabling and why such cabling is recommended.
That is one of motive why I've created this document.
Version: 0
Date: 2005-10-17
User Name: 113848
Action: Add Comment
Comment: Isn't this cabling method in Doc 82908 already described in the Best Practices Manuals ?
Granted, it is good that this doc explains WHY you should cable that way (channels 0 & 1 sharing ISP 2313 chip) , but the doc should refer to the cabling diagram in the best practices manual as well, shouldn't it ?
Version: 0
Date: 2005-10-16
User Name: 147406
Action: Accept
Comment:
Version: 0
Date: 2005-10-16
User Name: 31844
Action: Approved
Comment: please review
Version: 0
Date: 2005-10-16
User Name: 31844
Action: Created
Comment:
Version: 0
Product_uuid
58553d0e-11f4-11d7-9b05-ad24fcfd42fa|Sun StorageTek 3510 FC Array
Attachments
This solution has no attachment