Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2031193.1
Update Date:2018-05-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  2031193.1 :   Oracle ZFS Storage Appliance: Memory was not online after DIMM replacement and fault cleared  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-10681759961>

Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-BA - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

We have observe memory fault in ILOM logs.

    Errors: fault.cpu.intel.quickpath.mem_scrub

 

## The 'hardware view' shows NO issues with the memory DIMMs for the 'fpzs3cont02' appliance:

             NAME         STATE     MANUFACTURER            MODEL                                       SERIAL      
chassis-000  testnas      ok        Oracle                  Oracle ZFS Storage ZS3-2                    xxxxNM200X           
cpu-000      CPU 0        ok        Intel                   Intel(r) Xeon(r) CPU E5-2658 0 @ 2.10GHz    unknown
cpu-001      CPU 1        ok        Intel                   Intel(r) Xeon(r) CPU E5-2658 0 @ 2.10GHz    unknown
disk-000     HDD 0        ok        HITACHI                 H109090SESUN900G                            001406A1UWPJ        KVK1UWPJ 
disk-001     HDD 1        ok        HITACHI                 H109090SESUN900G                            001406A04YAJ        KVK04YAJ 
disk-002     HDD 2        ok        SANDISK                 LB1606R--SUN1.6T                            41030692             
disk-003     HDD 3        absent    -                       -                                           -                    
disk-004     HDD 4        absent    -                       -                                           -                    
disk-005     HDD 5        absent    -                       -                                           -                    
disk-006     HDD 6        absent    -                       -                                           -                    
disk-007     HDD 7        absent    -                       -                                           -                    
fan-000      FM 0         absent    Oracle                  7019706                                     -
fan-001      FM 1         absent    Oracle                  7019706                                     -
fan-002      FM 2         absent    Oracle                  7019706                                     -
fan-003      FM 3         absent    Oracle                  7019706                                     -
fan-004      FM 4         absent    Oracle                  7019706                                     -
memory-000   DIMM 0/7     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574F96
memory-001   DIMM 0/6     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE0314081557547F
memory-002   DIMM 0/5     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815575674
memory-003   DIMM 0/4     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574F3B
memory-004   DIMM 0/0     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574E36
memory-005   DIMM 0/1     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815575921
memory-006   DIMM 0/2     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574829
memory-007   DIMM 0/3     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE031408155754F2
memory-008   DIMM 1/7     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE031408155745AD
memory-009   DIMM 1/6     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574828
memory-010   DIMM 1/5     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574286
memory-011   DIMM 1/4     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574D61
memory-012   DIMM 1/0     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815574F3C
memory-013   DIMM 1/1     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE03140815575042
memory-014   DIMM 1/2     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE0314081557534B
memory-015   DIMM 1/3     ok        Samsung                 16384MB DDR3 SDRAM DIMM                     00CE0314081557544F
psu-000      PS 0         ok        Delta Electronics       300-2299-01                                 C28249
psu-001      PS 1         ok        Delta Electronics       300-2299-01                                 C28227
slot-000     PCIe 1       ok        Oracle                  Sun Storage 6GB SAS Internal HBA            LSISAS2008ALLSGX-SAS6-INT-ZH3-25104-03BSP40743549
slot-001     PCIe 2       ok        PMC-Sierra              Oracle ZFS Storage SAS-2 6Gbs 16 port PCIe  4B02134F020
slot-002     PCIe 5       ok        Sun Microsystems, Inc.  2x8Gb Fibre Channel                         unknown
slot-003     PCIe 6       ok        Sun Microsystems, Inc.  2x8Gb Fibre Channel                         unknown
slot-004     PCIe 3       absent    -                       -                                           -
slot-005     PCIe 4       absent    -                       -                                           -

 

## The 'info.dump' view shows only ~176GB of physical RAM is visible/in-use:

## info.dump
{
    build_checksum: 709400929,
    build_date: Sat Dec 07 2013 01:14:41 GMT+0000 (UTC),
    build_host: 'fishbuild1',
    hw_asn: 'c0bfad5e-c10a-c4d6-b95d-dffd4ba4ca68',
    hw_csn: '1416NM200X',
    hw_physmem: 180214,            <<<<<<<<   NOT all installed memory 'seen' !!
    hw_product: 'Sun Netra X4270 M3',
    ak_debug: false,
    ak_product: 'SUNW,maguroG2',
    sp_version: '3.1.2.18',
    fw_release: '04/16/2013',
    fw_version: '21000214',
    fw_vendor: 'American Megatrends Inc.',
    os_isa: 'i386',
    os_platform: 'i86pc',
    os_bits: 64,
    os_debug: false,
    os_uptime: 1430544301,
    os_boot: Sat May 02 2015 05:25:01 GMT+0000 (UTC),
    os_machine: 'i86pc',
    os_version: 'ak/generic@2013.06.05.1.1,1-1.2',
    os_release: '5.11',
    os_nodename: 'fpzs3cont02',
    os_sysname: 'SunOS',
    ssl_version: 'OpenSSL 1.0.0k 5 Feb 2013',
    http_version: 'Apache/2.2.24 (Unix)',
    updated: Sat Apr 19 2014 05:16:39 GMT+0000 (UTC),
    installed: Sat Apr 19 2014 05:16:39 GMT+0000 (UTC)
}

 

$ egrep "%|FRU:" fltlog.txt | sort | uniq -c | sort -rn
1795   100%  fault.cpu.intel.quickpath.mem_scrub        Replaced
 732   100%  fault.cpu.intel.quickpath.mem_scrub
 249                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15574F3C:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=0/dimm=0
 105                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15574828:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=0/dimm=0
 315                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=1557534B:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=1/dimm=0
 285                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15575042:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=1/dimm=1
 315                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15574D61:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=2/dimm=0
 476                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15574286:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=2/dimm=1
 315                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=15574828:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=3/dimm=0
 467                FRU: hc://:chassis-mfg=Oracle:chassis-name=Sun-Netra-X4270-M3:chassis-part=32651707+10+1:chassis-serial=1416NM200X:fru-serial=155745AD:fru-part=M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=3/dimm=1
   2   100%  fault.memory.intel.sb.dimm_ce
   2                FRU: hc://:product-id=Sun Netra X4270 M3:product-sn=1416NM200X:chassis-id=1416NM200X:server-id=ORACLESP-1416NM200X:serial=00CE03140815574F3C:part=001-0003-01,M393B2G70BH0-YK0/chassis=0/motherboard=0/chip=1/memory-controller=0/dram-channel=0/dimm=0

 

$ grep Location fmadm.out | sort | uniq -c | sort
 105      Location         : "/SYS/MB/P1/D0"
 119      Location         : "/SYS/MB/P1/D1"
 120      Location         : "/SYS/MB/P1/D3"
  95      Location         : "/SYS/MB/P1/D4"
 105      Location         : "/SYS/MB/P1/D5"
 105      Location         : "/SYS/MB/P1/D6"
  83      Location         : "/SYS/MB/P1/D7"

 

 

Cause

I believe this may be the issue outlined in Bug 20630617:

  CR 20630617 (EDX4-2 DB memory was not online after memory replacement and fault cleared)

 

 

Solution

The documented process in the service manual is :

    1)  Power off HOST.

    2)  In ILOM CLI/WEB, "Set /SP/policy/ HOST_AUTO_POWER_ON=disabled".

    3)  Disconnect AC power, and Replace faulted DIMMs.

    4)  Connect AC power again, wait for 5 mins let ILOM boot up finish.

    5)  Power on HOST, restore "/SP/policy/ HOST_AUTO_POWER_ON" to original value.

 

 

***Checked for relevance on 30-MAY-2018***

References

<NOTE:1019887.1> - Sun Storage 7000 Unified Storage System: How to Collect a Support Bundle using the BUI or CLI

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback