![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1664436.1 : Storage Server in continuous reboot due to kernel panic and steps to reimage
In this Document
Created from <SR 3-8773210011> Applies to:Oracle Exadata Storage Server Software - Version 11.1.0.3.0 to 12.1.1.1.0 [Release 11.1 to 12.1]Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. SymptomsThe battery was replaced for the LSI disk controller on one of the storage cells. After the replacement, the storage cells is continuously rebooting. The same symptoms may also occur without any hardware maintenance activity having previously been performed. CauseFrom the ILOM snapshot or via the /SP/console or ILOM Remote Console, we see the following messages before the reboot of the server, which indicates a kernel panic : The ilom/@persist@hostconsole.log as collected via the ILOM snapshot will contain lines similar to the following when the LSI HBA controller is properly detected, along with it's 12 physical disks. Note that the disk manufacturer may differ from below:
megasas: 06.505.02.00 Wed. Nov. 14 17:00:00 PDT 2012 <<<<<<<<<<<<<<<< megasas kernel module loaded ... scsi 0:0:8:0: Direct-Access SEAGATE ST32000SSSUN2.0T 061A PQ: 0 ANSI: 5 <<<<<<<< Disks detected
If entries similar to above do show up, we are likely dealing with a corruption of the root filesystem. This generally requires that the cell be reimaged from the internal USB.
SolutionThis may be due to improper battery replacement or improper card seating or a card damaged during battery replacement. It may also be due to a corruption of the root filesystem. 1: An ILOM snapshot should be gathered to assist support. If it's deemed that the storage cell needs to be reimaged, the following steps can be followed via the java based ILOM Remote Console. They assume that the internal USB is healthy. 1. Select the last line from the grub menu. It reads: CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode 2. When prompted, select (r)einstall or try to recover damaged system. Confirm your decision when prompted "Are you sure?" 3. When prompted whether to erase data partition and disks, choose "no" 4. Follow the remaining prompts 5. Once cell is up, the celldisks will need to be imported. Run: cellcli -e import celldisk all force
5. Check if flashlog and flashcache are created: cellcli -e list flashcache detail
cellcli -e list flashlog detail 6. Run the following and check if flashCacheMode matches that of a healthy cell: cellcli -e list cell detail
7. Manually add the griddisks to ASM: a. Check the status of the griddisks qlplus / as sysasm
col path format a59 set pagesi 200 set linesi 200 select path, name, header_status, mode_status, mount_status, state from v$asm_disk oder by path; - The griddisks belonging to the reimaged cell should show up with a header_status of CANDIDATE b. For each diskgroup (DATA, RECO, DBFS_DG, etc), run: alter diskgroup <diskgroup name> add disk '<path to diskgroup from above query>/<diskgroup name>*<cell name>';
e.g. for DATA diskgroup, given a cell whose name is chsmchs00203 and whose Infiniband is configured for active/active, hence two IPs: alter diskgroup DATA add disk 'o/10.111.249.15;10.111.249.16/DATA*chsmsck00203';
References<NOTE:1448069.1> - How to run an ILOM Snapshot on a Sun/Oracle X86 SystemAttachments This solution has no attachment |
||||||||||||||||||
|