Asset ID: |
1-71-1534465.1 |
Update Date: | 2017-05-18 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1534465.1
:
How to collect data for Netra CT900 related problems
Related Items |
- Netra T3-1BA
- Sun Netra CT900 Server
- Sun Netra CP3260 ATCA Blade Server
- Sun Netra CT1600 Server
- Sun Netra CP3010 Blade Server
- Sun Netra CP3060 ATCA Blade Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Usx/Blade/Netra>SN-SPARC: Netra Cxxxx
|
How to collect relevant data based on problematic component of Netra CT900
In this Document
Applies to:
Sun Netra CT900 Server - Version All Versions and later
Sun Netra CP3010 Blade Server - Version Not Applicable and later
Sun Netra CP3060 ATCA Blade Server - Version Not Applicable and later
Sun Netra CP3260 ATCA Blade Server - Version Not Applicable and later
Netra T3-1BA - Version Not Applicable and later
Information in this document applies to any platform.
Goal
Customer to collect relevant data for troubleshooting.
Solution
The customer should collect relevant information based on following chassis components:
- SAP ( Shelf Alarm Panel)
- FT (Fan Tray)
- PEM (Power Entry Module)
- ShMM (Shelf Management Module)
- Blade/RTM
SAP (Shelf Alarm Panel)
Symptom: Serial port to ShMM not working
To do:
- Using terminal server or laptop/workstation
- Serial port baudrate set correctly
- Check integrity of cable
- If all OK, replace SAP
Fan Tray (FT)
Symptom:
- SAP alarm LED light up
- FT RED/BLUE LED light up
- Sensor showing abnormal temperature (either receiving SNMP message or alarm triggered)
- Fans at abnormal speed (FT not level 5)
To do:
- Identify which FT is at fault
- Check the LED status using: [clia] getfruledstate 20 [3|4|5]
- Check fan state ([clia] fans) and cooling state ([clia] shelf -v fs, [clia] shelf -v cs)
- Check sensor reading: [clia] sensordata board <slot #>
- Check air filter
Example output:
# clia getfruledstate 20 3
20: FRU # 3, Led # 0 ("BLUE LED"):
Local Control LED State: LED OFF
20: FRU # 3, Led # 1 ("LED 1"):
Local Control LED State: LED OFF
20: FRU # 3, Led # 2 ("LED 2"):
Local Control LED State: LED ON, color: GREEN
# clia fans
20: FRU # 3
Current Level: 5
Minimum Speed Level: 0, Maximum Speed Level: 15
20: FRU # 4
Current Level: 5
Minimum Speed Level: 0, Maximum Speed Level: 15
20: FRU # 5
Current Level: 5
Minimum Speed Level: 0, Maximum Speed Level: 15
# clia shelf -v cs
Cooling state: "Normal"
Sensor(s) at this state: (0x8e,6,0) (0x90,7,0) (0x90,8,0) (0x90,23,0)
(0x90,24,0) (0x90,25,0) (0x90,40,0) (0x90,41,0)
(0x90,42,0) (0x8e,5,0) (0x90,6,0) (0x86,20,0)
(0x86,21,0) (0x86,22,0) (0x86,23,0) (0x86,24,0)
(0x86,25,0) (0x86,26,0) (0x92,6,0) (0x92,7,0)
(0x92,30,0) (0x92,31,0) (0x86,19,0) (0x94,7,0)
(0x94,8,0) (0x94,23,0) (0x94,24,0) (0x94,25,0)
(0x94,40,0) (0x94,41,0) (0x94,42,0) (0x92,5,0)
(0x9a,5,0) (0x9a,6,0) (0x9a,29,0) (0x9a,30,0)
(0x9a,31,0) (0x94,6,0) (0x96,5,0) (0x96,6,0)
(0x96,29,0) (0x96,30,0) (0x96,31,0) (0x9a,4,0)
(0x9c,4,0) (0x9c,5,0) (0x96,4,0) (0x82,20,0)
(0x82,36,0) (0x82,37,0) (0x82,44,0) (0x82,45,0)
(0x82,52,0) (0x82,53,0) (0x9c,3,0) (0x82,10,0)
(0x88,6,0) (0x88,7,0) (0x88,8,0) (0x88,23,0)
(0x88,24,0) (0x88,25,0) (0x20,120,0) (0x20,121,0)
(0x20,122,0) (0x20,123,0) (0x20,124,0) (0x20,125,0)
(0x20,126,0) (0x20,200,0) (0x20,201,0) (0x98,3,0)
(0x98,4,0) (0x98,5,0)
# clia shelf -v fs
Fans state: "Normal"
Sensor(s) at this state: (0x10,8,0) (0x10,10,0) (0x10,11,0) (0x10,13,0)
(0x10,14,0) (0x10,7,0)
NOTE: When only SAP LED lights up, all data should be checked fine because FT speed up and cool the chassis already. Just need to clear alarm ([clia] alarm clear).
NOTE: Be sure to re-seat the FT at least once before determining to replace it.
PEM (Power Entry Module)
Symptom: RED/BLUE LED light up
To do:
- If replacement is needed, customer will have to provide licenced electrician.
- Check the LED status using: [clia] getfruledstate 20 [6|7]
- Check PEM sensors for any "Entity Absent" state
- Replace the faulty PEM
Example Output:
# clia getfruledstate 20 6
20: FRU # 6, Led # 0 ("BLUE LED"):
Local Control LED State: LED OFF
20: FRU # 6, Led # 1 ("LED 1"):
Local Control LED State: LED OFF
20: FRU # 6, Led # 2 ("LED 2"):
Local Control LED State: LED ON, color: GREEN
# clia sensor 20 | grep PEM
20: LUN: 0, Sensor # 162 ("PEM A In 2")
20: LUN: 0, Sensor # 163 ("PEM A In 2 Fused")
20: LUN: 0, Sensor # 164 ("PEM A In 1")
20: LUN: 0, Sensor # 165 ("PEM A In 1 Fused")
20: LUN: 0, Sensor # 166 ("PEM A In 4")
20: LUN: 0, Sensor # 167 ("PEM A In 4 Fused")
20: LUN: 0, Sensor # 168 ("PEM A In 3")
20: LUN: 0, Sensor # 169 ("PEM A In 3 Fused")
20: LUN: 0, Sensor # 174 ("PEM B In 2")
20: LUN: 0, Sensor # 175 ("PEM B In 2 Fused")
20: LUN: 0, Sensor # 176 ("PEM B In 1")
20: LUN: 0, Sensor # 177 ("PEM B In 1 Fused")
20: LUN: 0, Sensor # 178 ("PEM B In 4")
20: LUN: 0, Sensor # 179 ("PEM B In 4 Fused")
20: LUN: 0, Sensor # 180 ("PEM B In 3")
20: LUN: 0, Sensor # 181 ("PEM B In 3 Fused")
20: LUN: 0, Sensor # 192 ("PEM A")
20: LUN: 0, Sensor # 193 ("PEM B")
20: LUN: 0, Sensor # 200 ("PEM A Temp")
20: LUN: 0, Sensor # 201 ("PEM B Temp")
# clia sensordata 20 164
20: LUN: 0, Sensor # 164 ("PEM A In 1")
Type: Discrete (0x6f), "Entity Presence" (0x25)
Status: 0xc0
All event messages enabled from this sensor
Sensor scanning enabled
Initial update completed
Sensor reading: 0x00
Current State Mask 0x0001
Entity Present
ShMM (Shelf Management Module)
Symptom:
- Could not log in
- Could not ping
- No console access to blade ([clia] console <slot #>)
- Firmware upgrade problem
- SNMP related problem
To do:
- Obtain a clear description and log of what has been attempted form customer
- Collect:
- /tmp/debug.log (created by command /etc/summary)
- /etc/shelfman.conf
- /etc/openhpi.conf
For any networking related problem (remote log in and ping related problem), make sure there are route to ShMM and the route is pingable from both directions.
For console access (or netconsole), check /etc/openhpi.conf and switch blade setting. Make sure VLAN 55 IP (from /var/netcons.ip) are pingable.
For firmware upgrade problem, obtain a complet log and check command arguments carefully. Make sure the correct version is used and README file is followed.
For SNMP problem, check on two thing:
- "df -k" output --- /dev/ram0 should not be filled up, or not enough swap memory and some process will be shut down
- CP3060 voltage event on sensor 9 (see doc 1346085.1 for sensor numbers) or other voltage sensors --- if voltage dropped to threshold, numerous IPMB events are generated and ShMM stops respond to SNMP porbing; work around is to lower threshold:
# clia help setthreshold
Set the specified threshold of the dedicated sensor
unc - Upper Non Critical
uc - Upper Critical
unr - Upper Non Recoverable
lnc - Lower Non Critical
lc - Lower Critical
lnr - Lower Non Recoverable
instead of <addr> user may use:
board <N>
shm <N>
to access the sensor on the specified board
"-r <value>" considers <value> as unsigned byte
just "<value>" considers as the floating point number
setthreshold board 21 "IPMB LINK" unc -r 34
setthreshold 20 8 lc -45.67
setthreshold <addr> [ lun: ] | unc | uc | unr | lnc | lc | lnr [-r] value
Blade / RTM
Symptom: Blade shut down, hang, panic; could not boot, etc.
To do:
- Collect explorer / core dump / snapshot
- Troubleshoot as it is any other UltraSPARC system
- Be sure to re-seat blade before replacing it --- when a blade died, there are residual values in IPMB controller (H8) that might prevent blade form booting back up normally; if reboot blade does not clear it, re-seat blade definitely will.
Attachments
This solution has no attachment