Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1985677.1
Update Date:2018-04-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1985677.1 :   Running bdacheckcluster/bdacheckhw Oracle Big Data Appliance Reports: "INFO: Errors reported on disk : 0 1"  


Related Items
  • Big Data Appliance X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-10321759151>

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms


Running bdacheckcluster/bdacheckhw Oracle Big Data Appliance Reports: "INFO: Errors reported on disk <X> : 0 1"

# bdacheckcluster
...
INFO: Starting cluster host hardware checks
bdanode0x: INFO: Errors reported on disk <N> : 0 1
SUCCESS: All cluster hosts pass hardware checks
...
SUCCESS: Big Data Appliance cluster health checks succeeded



Running "MegaCli64 pdlist a0" for Node bdanode0x disk in slot <N> indicates a non-zero value for "Other Error Count":

Enclosure Device ID: 8
Slot Number: 3
Drive's postion: DiskGroup: 3, Span: 0, Arm: 0
Enclosure position: 0
Device Id: 15
WWN: 5000CCA05C9839DB
Sequence Number: 2
Media Error Count: 0
Other Error Count: 1                     
Predictive Failure Count: 0
....

 

Cause

An "Other Error Count" value of 1 does not mean the disk has failed or even that it is going to fail.  It is an information indicating that you should monitor that disk.

Solution


The "Other Error Count" is not a criteria for disk replacement.

The "Other Error Count" is related to errors for HBA, cable, topology and other errors not specifically related to the disk.

Hence if the "Other Error Count" continues to rise it is an indicator that some other problem (non-disk) is going on which may need to be investigated, identified and fixed.

Note: A list of key code qualifiers to help interpret events can be found in the public Key Code Qualifier discussion.

1. Check the "bdadiag snapshot" logs for megacli getevents -all output.
2. Check for an "Unexpected sense" Event Description like:
Event Description: Unexpected sense: PD 0c(*/s10) Path *, CDB:*** , Sense: b/47/01
3. Here the sense key b/47 indicates a command aborted due to SCSI parity error (refer to this public Key Code Qualifier table http://en.wikipedia.org/wiki/Key_Code_Qualifier )
which may occur for several reasons, just one of them being an hardware issue of the disk.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback