![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1359373.1 : Sun Fire[TM] Servers (V480, V490, V880, V890): How to Manual Decoding of DIMM(s) in Memory Error
In this Document
Applies to:Sun Fire V490 Server - Version Not Applicable and laterSun Fire V880 Server - Version Not Applicable and later Sun Fire V480 Server - Version Not Applicable and later Sun Fire V880z Visualization Server - Version Not Applicable and later Sun Fire V890 Server - Version Not Applicable and later Information in this document applies to any platform. PurposeThe purpose of this document is to provide guidance on manually decoding memory locations on the Entry Level Servers listed above. Many errors provide the DIMM(s) location, but manual decoding is necessary for Red State Exceptions, Fatal Resets, or errors systems having a Solaris 8 Kernel Update Patch 108528-15 (or earlier). For those errors you will have to manually decode the DIMM(s) from the AFSR and AFAR data given in the âRed State Exception, âFatal Reset, or Solaris 8 KUP 108528-15 (or earlier) error message outputs. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - SPARC Legacy Servers
Troubleshooting StepsMemory Error Output from “/var/adm/messages”: May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 649096 kern.info] NOTICE: [AFT0] Corrected system bus (CE)
Event detected by CPU7 at TL=0, errID 0x00000019.ea55b668 May 19 10:06:47 sf02 AFSR 0x00000002<CE>.000000b0 AFAR 0x000000d0.cfea4a20 May 19 10:06:47 sf02 Fault_PC 0x1009b110 Esynd 0x00b0 Slot D: J3001 May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 311202 kern.info] [AFT0] errID 0x00000019.ea55b668 Corrected Memory Error on Slot D:J3001 is Persistent May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 291034 kern.info] [AFT0] errID 0x00000019.ea55b668 Data Bit 103 was in error and corrected May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 315577 kern.info] [AFT2] errID 0x00000019.ea55b668 PA=0x000000d0.cfea4a00 May 19 10:06:47 sf02 E$tag 0x00000343.3f000002 E$state_0 Exclusive May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x00) 0xbaddcafe.baddcafe 0xbaddcafe.baddcafe ECC 0x0be May 19 10:06:47 sf02 SUNW,UltraSPARC-III+: [ID 895151 kern.info] [AFT2] E$Data (0x10) 0xbaddcafe.baddcafe 0xbaddcafe.baddcafe ECC 0x0be Summary of Steps(1-6) needed for Manual Decoding of DIMM:Step #1 Find bit(s) in error using ECC Syndromes (Table #1) Detailed StepsStep #1 Find bit(s) in error using ECC Syndromes (Table #1): AFSR=00000002<CE>.000000b0
......................./^\ ....................../ | \ ...................../ | \ ..................0000 1011 0000 (binary) = 0b0 (hex) ............bits.....8 7654 3210 = From right to left; bits (8 - 0) is the ECC syndrome ECC syndrome (Esynd) = 0b0 x coordinate = 0 y coordinate = 0b
NOTE: Please write down this Data/ECC Check bit number. You will need it later in your calculations.
Here is an example of a fully configured Sun Fire V880's (Memory Configuration Output from OBP/POST) table with 8-way memory interleaving: CPU0 Bank0 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #0
CPU0 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #2 CPU0 Bank2 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #4 CPU0 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #6 CPU1 Bank0 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #0 CPU1 Bank1 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #2 CPU1 Bank2 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #4 CPU1 Bank3 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #6 CPU2 Bank0 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #1 CPU2 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #3 CPU2 Bank2 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #5 CPU2 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 8way #7 CPU3 Bank0 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #1 CPU3 Bank1 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #3 CPU3 Bank2 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #5 CPU3 Bank3 128 + 128 + 128 + 128 : 512MB @ b000000000 8way #7 CPU4 Bank0 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #0 CPU4 Bank1 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #2 CPU4 Bank2 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #4 CPU4 Bank3 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #6 CPU5 Bank0 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #0 CPU5 Bank1 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #2 CPU5 Bank2 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #4 CPU5 Bank3 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #6 CPU6 Bank0 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #1 CPU6 Bank1 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #3 CPU6 Bank2 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #5 CPU6 Bank3 128 + 128 + 128 + 128 : 512MB @ c000000000 8way #7 CPU7 Bank0 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #1 CPU7 Bank1 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #3 CPU7 Bank2 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #5 CPU7 Bank3 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #7 NOTE: On Sun Fire V480's/V880's you find that 8-Way interleaving is the most common interleave group, if you have multiple interleave groups on the same board you must find out in which group the physical address belongs (see multiple interleave groups example below).
CPU0 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #0
CPU0 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #2 CPU2 Bank0 128 + 128 + 128 + 128 : 512MB @ a080000000 2way #0 CPU2 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #1 CPU2 Bank2 128 + 128 + 128 + 128 : 512MB @ a080000000 2way #1 CPU2 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #3
CPU0 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #0
CPU2 Bank1 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #1 CPU0 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #2 CPU2 Bank3 128 + 128 + 128 + 128 : 512MB @ a000000000 4way #3
CPU2 Bank0 128 + 128 + 128 + 128 : 512MB @ a080000000 2way #0
CPU2 Bank2 128 + 128 + 128 + 128 : 512MB @ a080000000 2way #1
CPUs 0 and 2 interleave memory starting @ 0x a000000000
CPUs 1 and 3 interleave memory starting @ 0x b000000000 CPUs 4 and 6 interleave memory starting @ 0x c000000000 CPUs 5 and 7 interleave memory starting @ 0x d000000000 Very Important: Interleaving of memory is *Per CPU/Memory board slot only* and not across CPU/Memory board slots.
Slot A (CPU's 0 + 2)
Slot B (CPU's 1 + 3) Slot C (CPU's 4 + 6) Slot D (CPU's 5 + 7) NOTE: The above example is shown with 8 CPUs for V880 purposes only, but if you had a V480 (only 4 CPUs) you will see only the top half of this table (CPUs 0,1,2,3).
AFAR=0x000000d0.cfea4a20
...................../^\ ..................../ | \ .................../ | \ ...............1010 0010 0000 = From right to left; bits (11 – 0), but we are just interested in bits 9 – 6, which is highlighted since they correspond with the four bits of the lower mask value LM[3 – 0] in Table #2 (Memory Interleaving and Logical bank #'s). ........bits 98 7654 3210 = 1000
AFAR 0x000000d0.cfea4a20
a0.xxxxxxxx > Slot A (CPU's 0 + 2)
b0.xxxxxxxx > Slot B (CPU's 1 + 3) c0.xxxxxxxx > Slot C (CPU's 4 + 6) d0.xxxxxxxx > Slot D (CPU's 5 + 7)
CPU5 Bank0 128 + 128 + 128 + 128 : 512MB @ d000000000 8way #0
DIMM's J2900, J2901, J3000, J3001
NOTE: Checking the above memory error output we can see the DIMM in error (J3001) is part of our calculated four DIMMs.
References<NOTE:1370243.1> - CPU / Memory Dimm Location Map Sun Fire V480 V490 V880 V880z V890Attachments This solution has no attachment |
||||||||||||||||||||
|