![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1580223.1 : Running bdacheckcluster/bdacheckhw Fails on Oracle Big Data Appliance: "ERROR: Wrong disk status:Online,Spun Up Yes alert"
In this Document
Applies to:Big Data Appliance X3-2 Starter Rack - Version All Versions and laterBig Data Appliance X3-2 Full Rack - Version All Versions and later Big Data Appliance Hardware - Version All Versions and later Big Data Appliance X3-2 In-Rack Expansion - Version All Versions and later Linux x86-64 Symptoms1. Running the bdacheckcluster utility to verify the health of the BDA cluster raises: "ERROR: Hardware checks failing" for one or more servers in the cluster: # bdacheckcluster
... INFO: Checking hardware on host bdanode0n ... ERROR: Hardware checks failing on host bdanode0n ... ERROR: Big Data Appliance failed cluster health checks
# bdacheckhw
... SUCCESS: Correct disk 1 status : Online, Spun Up No alert SUCCESS: Correct disk 2 status : Online, Spun Up No alert ... ERROR: Wrong disk 5 status : Online, Spun Up Yes alert INFO: Expected disk 5 status : Online, Spun Up No alert SUCCESS: Correct disk 6 status : Online, Spun Up No alert ... SUCCESS: Correct disk 11 status : Online, Spun Up No alert INFO: Errors reported on disk 5 : 12 0 SUCCESS: Correct number of virtual disks : 12 ... ERROR: Big Data Appliance failed hardware validation checks
# MegaCli64 LdPdInfo a0 # MegaCli64 pdlist a0
Show output like below where the example here reports a problem for the disk in slot 5: Virtual Drive: 5 (Target Id: 5)
... : Media Error Count: 12 Other Error Count: 0 Predictive Failure Count: 5 Last Predictive Failure Event Seq Number: 15137 ... Firmware state: Online, Spun Up ... Drive has flagged a S.M.A.R.T alert : Yes
The Media Error Count, can be ignored. These are not failures and represent recoverable read/write errors. The disk has a firmware state of Online, Spun Up, but also exhibits predictive failures and a SMART alert: Predictive Failure Count: 5
Drive has flagged a S.M.A.R.T alert : Yes # lsscsi
[0:0:20:0] enclosu SUN HYDE12 0341 - [0:2:0:0] disk LSI MR9261-8i 2.12 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.12 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.12 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.12 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.12 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.12 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.12 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.12 /dev/sdh [0:2:8:0] disk LSI MR9261-8i 2.12 /dev/sdi ...
# mount -l
/dev/md2 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) /dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01] /dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02] /dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03] /dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04] /dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05] /dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06] /dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07] /dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08] /dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09] ...
<bdadiag...>/raid/megacli64-GetEvents-all.out when the Predictive Failure was first raised which provides a time line of how long the disk has been in this state. In the example here the Predictive Failure for the disk in slot 5 was reported for the first time at:
Time: Sat Aug 28 18:40:30 2013
Code: 0x00000060 Class: 1 Locale: 0x02 Event Description: Predictive failure: PD 15(e0x14/s2) Event Data: =========== Device ID: 21 Enclosure Index: 20 Slot Number: 5
CauseA predictive failure due to a SMART alert indicates a failing drive which should be replaced. However a disk showing predictive failures/SMART alerts is still usable although it has a high chance of becoming unusable in the near future. SolutionFile a Service Request with Oracle Support to have disk replaced as soon as possible. References<NOTE:1516469.1> - Oracle Big Data Appliance Diagnostic Information Collection with bdadiag V2.*/V3.*/V4.*Attachments This solution has no attachment |
||||||||||||||||||
|