![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2048556.1 : FS System: Low Performance Storage Tiers with Insufficient Drive Groups can Cause Latencies
In this Document
Created from <SR 3-11131042491> Applies to:Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. SymptomsThis document highlights potential performance issues when the low performance tier (Capacity HDD) does not have enough Drive Groups. ChangesPerformance is affected when multiple access to Capacity HDD Drive Groups is done simultaneously, for example, weekly backups. CauseAuto-Tiering uses the following Storage Classes as Tiers sorted by performance in descending order:
Over time, LUN hot spots (heavy IO) are migrated to higher tiers whereas LUNs receiving fewer IO requests are migrated to lower tiers. So hot data would go to the Performance SSD Drive Groups, some proportion of warm data would go to the 6 Perf Disk Drive Groups and a the remainder of cold data to the 2 Capacity HDD Drive Groups. The latencies generated by the slow tier due to multiple access can affect the overall performance of the FS1-2. This is especially true for the Auto-Tier LUNs that are waiting for the cold data to be processed before the Hosts can query other data. SolutionThe solution is to add more Capacity HDD Drive Enclosures to improve the response time of that Tier. There are a few workarounds:
In addition, Auto-Tier LUNs should not have their Initial Storage Class set to Capacity Disk if that class has only 2 Drive Groups. The Initial Storage Class can be viewed or modified on the FS1 GUI under SAN -> Storage -> LUNs and using the View or Modify options in the Actions menu. The user might have to uncheck the Use Storage Profile option to modify the Initial Storage Class.
To confirm the latencies, run /cores_data/local/tools/pillar/bin/getTraceDataR6.pl against the logs of both controllers (where the file output*.txt is located): Example: ---------------------------------------------
- Summary of all BRICK_DG - --------------------------------------------- ______________________________________________________________________________________________________________________________________ Object Tree First Last Total Max Avg Min Max Avg Min Max Avg (trace set) - (trace set) (sets) Length Length (ms) (ms) (ms) (MB/sec) (MB/sec) (MB/sec) ______________________________________________________________________________________________________________________________________ All BRICK_DG WT 08/05-17:22:29 - 08/05-17:22:46 3486 1280 89 0.51 4850.87 144.82 0.00 230.13 0.32 RD 08/05-17:22:31 - 08/05-17:22:46 116 1280 1242 0.54 87.83 13.36 7.46 945.54 47.57 dgH[0x0003] WT 08/05-17:22:33 - 08/05-17:22:44 8 84 50 16.30 62.59 28.55 0.17 2.32 0.91 RD 08/05-17:22:31 - 08/05-17:22:43 23 1280 1280 14.35 87.83 35.66 7.46 45.67 18.38 dgH[0x0002] WT 08/05-17:22:29 - 08/05-17:22:46 462 864 66 1.77 56.74 13.90 0.29 34.12 2.42 RD 08/05-17:22:31 - 08/05-17:22:46 28 1280 1280 0.78 21.00 8.62 31.21 837.98 76.05 dgH[0x0006] WT 08/05-17:22:29 - 08/05-17:22:46 1582 1159 35 0.51 8.92 1.11 0.10 230.13 16.06 RD 08/05-17:22:31 - 08/05-17:22:32 3 256 181 0.54 0.95 0.81 30.58 137.77 114.14 dgH[0x0008] WT 08/05-17:22:29 - 08/05-17:22:45 230 416 68 1.97 41.87 12.76 0.31 15.08 2.74 dgH[0x0007] WT 08/05-17:22:29 - 08/05-17:22:46 468 232 61 3.25 55.36 17.83 0.13 9.43 1.74 RD 08/05-17:22:31 - 08/05-17:22:46 24 1280 1280 0.83 24.14 12.40 27.15 787.27 52.84 dgH[0x0000] WT 08/05-17:22:29 - 08/05-17:22:46 190 1280 461 50.07 4850.87 1902.21 0.00 0.43 0.12 dgH[0x0001] WT 08/05-17:22:29 - 08/05-17:22:46 256 934 70 3.61 34.06 12.86 0.37 27.35 2.80 RD 08/05-17:22:31 - 08/05-17:22:45 32 1280 1280 0.69 15.15 2.73 43.24 945.54 240.15 dgH[0x0004] WT 08/05-17:22:31 - 08/05-17:22:46 49 608 14 3.49 20.61 8.51 0.02 17.90 0.85 RD 08/05-17:22:31 - 08/05-17:22:46 6 1280 1088 7.37 21.51 16.90 8.89 47.01 32.96 dgH[0x0005] WT 08/05-17:22:29 - 08/05-17:22:46 241 1280 309 7.31 3621.53 498.04 0.00 8.78 0.32 In this case, Drive Groups 0 and 5 are on the Capacity HDD DE and their average response time is above 400 milliseconds.
Finally, check with /cores_data/local/tools/pillar/FlashStorage/R6_RAID_PortAnalysis.pl on both controllers logs that all the drives in both Drive Groups have a high Average Ticks and Max Ticks compared to the rest of the drives. Example: INSTANCE 0
portNum | Average Ticks | Max Ticks | Total IOs | ---------------------------------------------------------------- 091 0.65 2 43 092 0.35 2 31 093 0.38 1 65 094 0.34 1 38 095 0.33 1 48 096 107.98 241 61 097 114.24 242 72 098 68.93 248 76 099 77.39 241 122 100 81.45 240 106 101 82.14 241 69 102 119.16 242 74 103 93.16 242 81 104 111.98 242 61 105 95.68 242 65 106 49.63 226 79 107 74.96 244 74 108 23.27 127 64 109 14.49 105 71 110 20.95 128 61 111 7.76 67 79 112 7.59 38 90 113 19.44 240 106 114 10.04 116 94 115 29.31 183 62 116 35.57 195 47 117 28.49 128 39 118 6.57 117 53 119 12.71 141 70 The 24 drives in bold belong to the Capacity HDD DE.
Verify with ‘/cores_data/local/tools/pillar/FlashStorage/FSInfo.pl -u’ if there is any discrepancies between the tiers allocation.
Here is another tip to find out the allocation of the specific Auto-Tier LUN: Search for the LUN in A1*.chsh.xml and scroll down to the GeometryCapacities tag: <GeometryCapacities>
References<BUG:21633679> - LATENCIES CAUSED BY HOTSPOT ON CAPACITY DISK ENCLOSURE USING SLATAttachments This solution has no attachment |
||||||||||||||||||||
|