Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1541199.1
Update Date:2016-09-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1541199.1 :   Exachk Failing For Cell Server "Hardware And Firmware Profile Check Is Not Successful On One Or More"  


Related Items
  • Oracle Exadata Storage Server Software
  •  
  • Exadata X3-2 Half Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-6982326841>

Applies to:

Exadata X3-2 Half Rack - Version All Versions to All Versions [Release All Releases]
Oracle Exadata Storage Server Software - Version 11.2.3.2.0 to 11.2.3.2.0 [Release 11.2]
Information in this document applies to any platform.

Symptoms

Exadata Server software version:11.2.3.2.0.120713

Node was rebooted and upon reboot a NTP issue was discovered which adjusted the time backwards.

Example from: /var/log/messages:
Apr  3 13:23:54 dm01cel20 shutdown[3201]: shutting down for system halt
Apr  3 13:23:54 dm01cel20 init: Switching to runlevel: 0
Apr  3 13:24:09 dm01cel20 snmpd[7119]: Received TERM or STOP signal...  shutting down...
Apr  3 13:24:09 dm01cel20 ntpd[7213]: ntpd exiting on signal 15
Apr  3 13:24:09 dm01cel20 auditd[6826]: The audit daemon is exiting.
Apr  3 13:24:09 dm01cel20 kernel: type=1305 audit(1365020649.942:419990): audit_pid=0 old=6826 auid=4294967295 ses=42949
67295 subj=system_u:system_r:auditd_t:s0 res=1
Apr  3 13:24:10 dm01cel20 kernel: type=1305 audit(1365020650.053:419991): user pid=3887 uid=0 auid=4294967295 ses=429496
7295 subj=system_u:system_r:auditctl_t:s0 audit_enabled=2 res=0
Apr  3 13:24:10 dm01cel20 kernel: Kernel logging (proc) stopped.
Apr  3 13:24:10 dm01cel20 kernel: Kernel log daemon terminating.
Apr  3 13:24:11 dm01cel20 exiting on signal 15
Jul 15 08:58:25 dm01cel20 syslogd 1.4.1: restart.   <<<================== wrong date on Reboot

 

When attempting to run exachk which calls the CheckHWnFWProfile script, the following message presents:
Hardware and firmware profile check not successful on one or more storage servers.

[WARNING]The hardware and firmware are not supported. See details below

Requires:
All All 06.05.10.00 ELP-4x100-4d-n All All TI35 All_of_16

Found:
No_Aura2_Flash_Exists



Changes

 OS restarted.

Cause

CheckHWnFWProfile suffered script code issue that failed to recognize the cards. Because the logic on the internal codes was changed due to some old bugs ,so it's not affected if the link has the wrong timestamp.
 
The following justifies how the issue is related to this specific customer:

1. By checking the following links, the timestamp was wrongly set to "May 7 2005" which is incorrect.

# ls -l /dev/disk/by-path/pci-0000*sas*
lrwxrwxrwx 1 root root  9 May  7  2005 /dev/disk/by-path/pci-0000:20:00.0-sas-8:0:0:0:1:-0x4433221104000000:0 -> ../../sdn

2.Check the message file and found the system reboot information. When it came back, it returned the wrong timestamp and the scsi devices created the links.

Jan 22 11:53:03 xxx exiting on signal 15                                                  
May  7 11:14:06 xxx syslogd 1.4.1: restart.
May  7 11:14:07 xxx kernel: rtc_cmos 00:06: setting system clock to 2005-05-07 01:13:13 UTC (1115428393)  


3.Because the logic on file scripts_aura.sh was changed due to some old bugs, it did not affect the identification of the flashdisk devices and flashcache was not affected.

Note: This is fixed in 11.2.3.2.1
 

Solution

1. validate the griddisks can be inactivated. cell is running on writethrough mode, so no extra validations required.

cellcli> list griddisk attributes name,asmmodestatus,asmdeactivationoutcome

if asmdeactivationoutcome is YES for all, continue

cellcli> alter griddisk all inactive

2. Stop all services

cellcli> alter cell shutdown services all

3. At the os level as root execute following script:

#!/bin/bash
ls -l /dev/disk/by-path/pci-0000*sas* | while read i
do
target=`echo $i | awk '{print $11}'`
linkname=`echo $i | awk '{print $9}'`
rm $linkname
ln -v -s $target $linkname
done

4. Validate the flashdisk devices have the right timestamp, executing:

#ls -l /dev/disk/by-path/pci-0000*sas*

Note that timestamp reported should include the day,hours, minutes and showing current time, for example:

# ls -lrt /dev/disk/by-path/pci-0000*sas*
lrwxrwxrwx 1 root root 9 Mar 24 20:46 /dev/disk/by-path/pci-0000:1b:00.0-sas-0x5080020000929863:1:1-0x1221000001000000:1 -> ../../sdo


5. Restart the services

cellcli> alter cell restart services all

6. Activate griddisks

cellcli> alter griddisk all active

 

References

<BUG:16029817> - CHECKHWNFWPROFILE FAILED TO DETECT AURA2 FLASH AFTER 6 MONTH

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback