![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1018758.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Split Expander Considerations
PreviouslyPublishedAs 230485 Applies to:Sun Fire 15K Server - Version Not Applicable and laterSun Fire E25K Server - Version Not Applicable and later Sun Fire 12K Server - Version Not Applicable to Not Applicable [Release N/A] Sun Fire E20K Server - Version Not Applicable to Not Applicable [Release N/A] All Platforms ***Checked for relevance on 14-JAN-2014*** GoalThis document covers the items to be aware of when configuring a split-expander domain(s) in a Sun Fire[TM] 12K/15K/E20K/E25K platform. SolutionDefinition and Purpose: A split expander, where the associated slot boards for a given expander are assigned to different domains, is a valid, supported configuration option in a Sun Fire 12K/15K/E20K/E25K. However, there are some behaviors of split expander that should be considered prior to implementing a split configuration. The information below is not intended to be read as "don't use split expander". But, for some customers, the limitations imposed on the system may dissuade its use. 1. Performance Memory accesses through a split expander take an additional two clock cycles (13 ns), increasing overall latency of memory operations. If all 18 expanders in a 15K/E25K are split, memory latency to other board sets increases ~6%. 2. Single Point Of Failure By the very nature of sharing a component between two domains, that component becomes a single point of failure for the two domains. The failure of a split expander will interrupt both domains it serves. Translating to Mean Time Between Failures (MTBF), if all 18 expanders in a 15K/E25K are split, the MTBF is decreased (made worse) by ~5%. 3. Residual RStops Consider the following configuration in which EX1 is split between domains A and B: +-----+-----+-----+ | | | | | SB0 | SB1 | SB2 | | (A) | (A) | (B) | | | | | +-----+-----+-----+ | IO0 | IO1 | IO2 | | (A) | (B) | (B) | | | | | +-----+-----+-----+ EX0 EX1 EX2 Now, suppose that a stop condition occurs in SB1. The stop can be either a Dstop or an Rstop. The stop condition requires that the history recording of the ASICs serving Domain A freeze recording until a hardware state dump can be taken. So history recording must be frozen on both EX0 and EX1 because both these expanders serve the domain with the stop condition. While the expander ASICs can service individual slots with complete separation, there is only a single set of history registers per ASIC. These history registers track transactions through the ASIC regardless of slot. In this example, the end result is that the stop condition in Domain A requires a history freeze for EX1. When history is frozen in EX1 this effectively freezes history recording in Domain B...resulting in an Rstop condition on Domain B. SMS will gather a hardware state dump for both Domain A and Domain B, but the dump for Domain B will, in all likelihood, be error free and uninteresting for diagnosis. A residual Rstop is only created when the source of a stop condition is in a split expander. 4. Delays in hpost Executions Whenever hpost executes against a set of domain resources, it requires exclusive access to the expander(s) for that domain. To achieve exclusivity, hpost places a lock on the expander(s) at the start of an hpost run. If another hpost process wants to access a locked expander, it must wait. Consider the example above again. Suppose a high level POST is running on Domain A. Any attempt to run hpost for Domain B must wait until the POST of A is complete because the POST for Domain A holds the lock on EX1. For larger domains, the wait time can be significant. And, this applies to any invocation of hpost including: o setkeyswitch operations o Collection of hardware state dumps (Dstop/Rstop) o Domain reboots o Domain recovery actions 5. Centerplane Bus Degradation In a split expander configuration, the ASICs on the expanders handle domain isolation (via the AXQ and SDI Domain Mask Registers). But from the centerplane's point of view, it exchanges address and data in formation with expanders, not slot boards. This introduces the concept of a set of communicating expanders, or SOCX. Refer to the setbus man page for more details. In the example above, the expander ASICs ensure that, for example, Domain A's SB1 cannot transmit information to Domain B's IO2. But, within the centerplane, EX1 is (must be) able to communicate with EX2. Likewise, EX1 is (must be) able to communicate with EX0. Because EX1 is split, the SOCX is EX0, EX1 and EX2. The reason is because of the address, data, and response busses on the centerplane. An expander must agree with other expanders it can communicate with on which busses are available for transmission. Suppose the address bus for EX0 was degraded to only use the low half of the centerplane. Since EX0 can communicate with EX1, EX1 must also only use the low address bus. Then, since EX1 is impacted, EX2 must also be degraded. The entire SOCX is impacted if any of its member expanders is degraded. So, using split expanders increases the SOCX, and thereby decreases the granularity of bus degradation. This is evident when trying to use the setbus command: % setbus -c cs0 -b a EX0 The expander in slot 1 communicates with slots not already listed, and will be added to the list of boards to reconfigure. Are you sure you want to continue the reconfiguration (yes/no) Finally, degradation can cascade if a configuration is "staggered" across multiple split expanders. Consider the following configuration, adding Domain C: +-----+-----+-----+-----+-----+ | | | | | | | SB0 | SB1 | SB2 | SB3 | SB4 | | (A) | (A) | (B) | (B) | (C) | | | | | | | +-----+-----+-----+-----+-----+ | IO0 | IO1 | IO2 | IO3 | IO4 | | (A) | (B) | (B) | (C) | (C) | | | | | | | +-----+-----+-----+-----+-----+ EX0 EX1 EX2 EX3 EX4 Degrading the bus configuration of any of these expanders effects all. The SOCX is all five expanders. 6. MaxCPU Configuration In order to allow for MaxCPU boards to be used in a Split Expander configuration on a Sun Fire[TM] 12K/15K platform, you need the Expander AXQ revision 6.3 in addition to the appropriate revision of the SMS HPOST patch, go to patch ID 114608-09 or higher for SMS 1.3 and Patch ID 117371-02 or higher for SMS 1.4.1. This HPOST fix will be integrated into SMS 1.5 and SMS 1.6. 7. USIV+ Configuration In rare occurrences utilizing USIV+ 1.8 or 1.95 GHz System boards in split expander configurations are at risk for domains to reset with Dstop and reboot unexpectedly. This issue has only been seen on systems running SMS 1.6 and either a 1.8 or 1.95 GHz system board in a split configuration. This issue an be avoided by not using these system boards in a split configuration.
PTS provides a script to locate split expanders either on a live system or against Explorer output. The script is available at the following URL: 12K/15K/E20K/E25K Domain Mapper When examining a state dump, the SplitSlotEnbl bit in an expander's master SDI can be examined to determine if the expander is split or not. For example: redxl> shsdi -v 0 Note: Data is displayed from the currently loaded dump file. SDI EX00/S0 Component ID = 54317049 Master_Reset_Config[31:0] = 00000060 0 SDI_diserrlog MResC[0] => SDI Intern Reset 0 Slot0_diserrlog MResC[1] 0 Slot1_diserrlog MResC[2] 0x00 ExpID[4:0] MResC[28:24] 0 Mode[2:0] MResC[31:29] Master (0) Master_Stop_Config[31:0] = 01001113 1 DstopEnbl MStopC[0] 1 RstopEnbl MStopC[1] 0 SCIntEnbl MStopC[2] 0 L1Err->ErrPause MStopC[3] 1 Dstop->ErrPause MStopC[4] 0 L1Ecc->ScInt[1:0] MStopC[6:5] 2 L1Ecc->Rstop[1:0] MStopC[8:7] 0 L1Err->ScInt[1:0] MStopC[10:9] L1Slot asserted err 2 L1Err->Dstop[1:0] MStopC[12:11] L1Slot asserted err 0 SBBCErr->SCInt MStopC[13] 0 SBBCErr->Dstop MStopC[14] 1 EnblStopReqChk MStopC[24] 0 L1Dstop->ExpDStop MStopC[28] 0 AnyDstop->ExpDStop MStopC[29] 0 Dstop->DReset MStopC[30] For split exp 0 ShiftErrPausePhase MStopC[31] Core_Config[21:0] = 0D9142 0 Pass4TargIDDisbl CoreC[0] Rev 4+ 1 Slot1=SerDom1 CoreC[1] Rev 4+ --> 0 SplitSlotEnbl CoreC[5] In master SDI (0) .... SplitSlotEnbl = 1 indicates the expander is configured for split slot. @ Notes regarding USIV+ 1.8 and 1.95 GHZ boards:
When a customer is affected by CR 6852877, Attachments This solution has no attachment |
||||||||||||
|