![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1991211.1 : Pillar Axiom: Pilots continuously reboot due to incorrect DNS servers attempting to resolve the NTP server name during Boot State Pilot
In this Document
Created from <SR 3-10441055291> Applies to:Pillar Axiom 600 Storage System - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. SymptomsFollowing an upgrade or a Pilot failover (or a manual pilot failover), the Active Pilot fails during Boot State Pilot. The Passive Pilot becomes Active and encounters the same problem; the issue continues in an endless loop. ChangesThis problem can happen when the Axiom is configured with DNS servers that are no longer in service. CauseDNS servers are needed during the Boot State Pilot if an NTP server has been defined. The Active Pilot will generate a core dump after multiple attempts to resolve the server name of the NTP server. This issue does not happen if there is no NTP server or the user provided the IP address of the NTP server instead of its name. SolutionA Bug has been filed with Development and we will update this document once we know the release version. It is not possible to modify the DNS or NTP settings when the Axiom is in Boot State Pilot. Please open a Service Request and quote this knowledge article. You should see under the Tasks window of the Axiom GUI (the Tasks button is on the bottom right corner of the GUI) the following task stuck at 50%: RemoteLoadNtpSettingsOperation
The other Tasks are fairly new and RemoteLoadNtpSettingsOperation will remain at 50% until the Pilot fails (it can take a few minutes) before completing the Boot State Pilot.
Verify with the customer that the DNS servers are no longer working. You can use nslookup: C:> nslookup ntp.mybusiness.com 10.10.10.10 (10.10.10.10 is the DNS server)
If you have logs, verify that you have the following errors: -bash-4.1$ find . -name "pcp.log*" | xargs egrep "Pacman has not heartbeat in 30 seconds"
./log/pcp.log.150309125459:2015-03-09 12:54:59.667 pilot2 pilotcfgproc: 16461 pcp:warning Pacman has not heartbeat in 30 seconds! ./log/pcp.log.150309125459-2015-03-09 12:54:59.667 pilot2 pilotcfgproc: 16462 pcp:warning Failing over to passive pilot as Pacman is not running
-bash-4.1$ find . -name "ntp.log*" | xargs egrep invalid.host.address
Note: the following steps need to be performed very quickly in order to be successful.
Open SSH with axiomcli or pcli. See <Document 1431693.1> Pillar Axiom: How to Enable SSH Access on the Pilot(s). SSH to the physical IP address of the Active pilot (do not use the Virtual IP address as we are going to stop pilotcfg). From the Active Pilot (for these examples we will use Pilot1), run the following command to stop pilotcfg on the passive pilot: [root@pilot1 root]# ssh pilot2 service pilotcfg stop &
Then stop pilotcfg on the active Pilot: [root@pilot1 root]# service pilotcfg stop &
Open SSH on the two pilots: [root@pilot1 root]# /etc/sysconfig/pillar/open-port.sh 22
[root@pilot1 root]# ssh pilot2 /etc/sysconfig/pillar/open-port.sh 22 The rest of the steps can be done without haste. Copy /etc/resolv.conf to /var/tmp and edit /var/tmp/resolv.conf and set the correct nameservers (DNS servers): [root@pilot1 root]# cp /etc/resolv.conf /var/tmp/
[root@pilot1 root]# vi /var/tmp/resolv.conf The 3rd nameserver needs to be the same as the 2nd nameserver. Here is an example of correct resolv.conf: [root@pilot1 root]# cat /etc/resolv.conf
# resolv.conf created by PCP nameserver 192.135.82.44 nameserver 192.135.82.60 nameserver 192.135.82.60 Run the following commands: [root@pilot1 tmp]# unalias cp The loop will help with completion of the Boot State Pilot as resolv.conf is meant to be updated with the DNS settings found in Persistence; the trick will overwrite the incorrect DNS server configuration. Open another putty on the physical IP of the Active Pilot. Run the following commands on the new session: [root@pilot1 tmp]# watch -n 1 'ls -la --time-style=full-iso /etc/resolv.conf'
Validate that the file is updated every second and do a CTRL-C Start pilotcfg on the Active Pilot: [root@pilot1 tmp]# service pilotcfg start
Open the Axiom GUI and wait for the Boot State Pilot to complete. Once the Axiom is in Warning (Passive Pilot should be Unknown), go to Configure -> Networking and modify the network settings to enter the correct DNS servers. Wait for the ModifyNetworkSettings task to reach 100% and to be cleared. Kill the loop and close the first putty session. Go to the second putty session (which is also a ssh session to the Active Pilot) Wait one minute Edit /etc/resolv.conf and fix the 3rd nameserver to make it identical to the 2nd nameserver. Edit /var/lib/pillar/pcp/pilot-config.xml and fix the 3rd <nameserver> tag to make it identical to the 2nd tag. Copy the following files to the Passive Pilot: [root@pilot1 tmp]# scp -p /etc/resolv.conf root@pilot2:/etc/ Start pilotcfg on the Passive Pilot: [root@pilot1 root]# ssh pilot2 service pilotcfg start
Verify that the Passive Pilot comes back as Normal on the Axiom GUI. References<BUG:20737976> - PASSIVE PILOT REBOOTING IN LOOP DUE TO WRONG DNS TO RESOLVE NTP IP<BUG:20737872> - PILOTS REBOOTING IN LOOP DUE TO WRONG DNS TO RESOLVE NTP IP Attachments This solution has no attachment |
||||||||||||||||||||
|