Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1448492.1
Update Date:2016-11-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1448492.1 :   Pillar Axiom: Pilot Healthcheck Serial Link Not Responding  


Related Items
  • Pillar Axiom 500 Storage System
  •  
  • Pillar Axiom 600 Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Axiom>SN-DK: Ax600
  •  




Created from <SR 3-5573577151>

Applies to:

Pillar Axiom 500 Storage System - Version Not Applicable to Not Applicable [Release N/A]
Pillar Axiom 600 Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 

Pilot Healthcheck Serial Link Not Responding

 

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage Pillar Axiom System

Changes

 

Pilot replacement, Loss of power, Bad Pilot Hard drive, Axiom move, Cabling error

 

Cause

 

The active pilot did not receive the heartbeat single from the stand by pilot. This could be due to error on the standby pilot, or the cable connection. This will not affect data serving as pilots are for the management functions.

 

Solution

 

From Pilot:

1- Check pilot status from within the GUI.

2- Check power to the pilot.

3- Check serial cable connector is connected correctly.  The serial cable should be connected to bottom serial ports next to the WHITE video interface.

4- Connect a keyboard and monitor to the Pilot in question. Check the condition of the pilot from host console, create an SR for investigation and report condition.

 

Troubleshooting serial port:

The serial communication is used to monitor pilot heartbeat as part of pcp process. Pcp log will report serial communication status. It is located on /var/log/pcp.log. In case of serial link failure it will report:

pcp:warning Serial Link Timeout - 1611 1565 3

The status is also reported in callhome logs if pilot logs are collected. İt is located on ../log/pcp_runtime_info.log.<time_stamp> like pcp_runtime_info.log.130604171304.

 

...

--------------------------

FofbAdaptor settings

--------------------------

m_ecWarmstartCount = 0

m_ecWarmstartTimer = 0

m_guid = 2008fffffffffff2

m_masterWritten = 1

m_fofbCodReset = 0

m_shuttingDown = 0

m_pilotState = 3

m_otherPilotState = 5

m_runPacman = 0

m_pilotHeartbeatCodReset = 0

m_lastSerialHeartbeat = 2439

m_pacmanHeartbeat = 0

m_smProviderHeartbeat = 0

m_excludedStateSet = 0

m_sameSwVerPrinted = 1

m_softwareUpdateInProgress = 1

m_buddyColdStartInProgress = 0

m_blockPacmanConman = 0

m_serialFailedEventSent = 1

System state = 8

...

 

The port driver is pcp itself, it just uses the OS termios, fcntl, and system ioctls. It does not use any OS serial drivers other than termios.

If you see this string in the /var/log/pcp.log, it means that pcp is unable to attach to the serial port:

pcp:warning Unable to open serial port /dev/ttyS0


If you see below string in the /var/log/pcp.log, it means that pcp is successfully attached to the serial port:

pcp:debug Successfully configured I/O parameters for serial port /dev/ttyS0

 

Test serial communication between pilots:

 

You can test serial port communication via redirecting port output.

  1. Enable ssh to pilots
  2. go to one pilot CU and use "cat /dev/ttyS0" and leave that running. Anything coming in on the serial port will dump out.
  3. Then, go to the other CU and redirect stdout of any command to that port, for example: “echo foobar > /dev/ttyS0” and you should see 'foobar' on the other pilots cat output.

If cable is not working then you will not see any output.

 

Test pilot serial ports:

 

Another failure scenario is related to pilot CU serial port failure. If you replaced cable with good one but serial link is still not working it may be the case. To identify failing FRU you need to use another device capable to serial communication like a laptop with serial port or USB to serial adapter. You can use USB to serial adapter in brick console cable set.

 

To perform test:

  1. Enable ssh on the pilots.
  2. Connect another putty using serial connection to the top pilot . The laptop need to be connected to the top pilot using the serial cable. The serial port parameters are 19,200 bps, 8 bit, No Parity, 1 Stop bit, and NO Flow Control
  3. From the top pilot ssh prompt, run: echo “test from pilot CU0” > /dev/ttyS0
  4. Check if there is something on the putty serial output on the laptop
  5. if there is not, maybe pilot CU0 is the bottom pilot, run the same command on the other pilot
  6. Connect the serial port to the bottom pilot.
  7. From the bottom pilot ssh prompt, run: echo “test from pilot CU1” > /dev/ttyS0
  8. Check if there is something on the putty serial output.

 The serial cable between pilots can be tested via connecting to the Slammer console.

 

Example of working serial link:

 

When you look in the current /var/log/pcp.log you should see the fofb node matrix go by every few seconds. Toward the end of this, you should see the NODE_PASSIVE NOOP messages from the other pilot, which would tell you that the serial port is working.

 

Here is serial port test entry by going to pilot2 and redirecting stdout to the tty.

[root@pilot2 root]# echo "THIS IS A SERIAL PORT TEST" > /dev/ttyS0

 

Here is what you should see every 5 seconds on pilot1

[root@pilot1 dev]# cat /dev/ttyS0

<<<PILOT_TWO****NODE_PASSIVE*CMD_NOOP*****>>>

<<<PILOT_TWO****NODE_PASSIVE*CMD_NOOP*****>>>

THIS IS A SERIAL PORT TEST

<<<PILOT_TWO****NODE_PASSIVE*CMD_NOOP*****>>>


If you cannot get response from any serial port run lsof on /dev/ttyS0 to make sure pcp is bound to it.

A correct example:

[root@pilot2 tty]# lsof /dev/ttyS0
COMMAND    PID USER   FD   TYPE DEVICE SIZE NODE NAME
pilotcfgp 3445 root   10r   CHR   4,64      3832 /dev/ttyS0
pilotcfgp 3445 root   11w   CHR   4,64      3832 /dev/ttyS0


 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback