![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1618570.1 : When a HDFS Disk goes Bad on Oracle Big Data Appliance Node, /root Directory is Filled up with HDFS data.
In this Document
Created from <SR 3-8342648851> Applies to:Big Data Appliance X3-2 Hardware - Version All Versions and laterBig Data Appliance Integrated Software - Version 2.1.0 and later Linux x86-64 SymptomsWhen a HDFS disk goes bad and unmounted (perhaps by a reboot) on one of the Oracle Big Data Appliance (BDA) nodes, /root file system gets filled up by HDFS data. Slot Number: 11
Firmware state: Unconfigured(bad) Foreign State: Foreign Foreign Secure: Drive is not secured by a foreign lock key
ChangesDisk went bad, but ASR didn't fire an SR due to internal Bug 18139689. Reboot has been performed on the node and as the disk is bad, it got unmounted during startup. Bug 18139689 - BDA DOES NOT RAISE ASR EVENTS FOR ALL UNHEALTHY DISK STATES CauseAs per Cloudera, data would be written to root filesystem by the DataNode (DN) under the following conditions .. Say disk in slot 11 (/u12) fails or turns bad: - /dev/sdl1 disk fails
- DN marks the /u12/hadoop/dfs volume as failed and stops writing to it - Administrator stops the DN - Administrator unmounts the /u12 - Administrator starts the DN from CM (Please note that few previous steps could have been done automatically by a reboot of the host) - On BDA /u12 has 755 permissions and it's owned by root, Cloudera Manager (CM) agent creates /u12/hadoop/dfs directory as it finds that /u12 is empty. It also recursively changes the ownership of /u12/hadoop to hdfs user. - DN gets started by the CM agent, sees that /u12/hadoop/dfs is empy, or "unformatted", and formats the "volume". It then starts normal operations and uses the newly formatted volume thus ending up writing to /root
SolutionInternal Bug 18139715 has been filed to change permissions on unmounted directories so that hdfs cannot write to /root in case of a disk failure. Bug 18139715 is aimed to be fixed in 2.4.1 release. Bug 18139715 - BDA SHOULD MAKE UNMOUNTED DATA DIRECTORIES UNREADABLE (PERMISSIONS 700) ASR Bug 18139689 to report all failure disk states is aimed to be fixed in 2.4.1 release. Meanwhile bad disk status can be checked in Cloudera Manager (CM). CM shows the DataNode on which disk failed with "bad health". The message is: "The data node has 1 volume of failure(s). Critical threshold any." Attachments This solution has no attachment |
||||||||||||||||||
|