Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1437249.1
Update Date:2017-11-01
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1437249.1 :   CT900 Troubleshooting & Data Collection Cheat Sheet  


Related Items
  • Sun Netra CT900 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Blades>SN-x64: TELCO-BL-NETRA
  •  




In this Document
Purpose
Scope
Details
 Data Collection:


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: restricted audience

Applies to:

Sun Netra CT900 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Purpose

Troubleshooting Cheat Sheet for Sun Netra CT-900 platform issues

Scope

Troubleshooting basics and how to collect data for additional problem analysis

Details

Acronyms:

      Acronym        Description,  Location
FT
Fan Tray, Bottom of Chassis
PEM
Power Entry Module, back of chassis
Equivalent of PSU (Power Supply Unit)
SAP
System Alarm Panel, top right corner
ShMM
Shelf Management Module, lower right corner, equivalent of Service Processor (SP)
shm1
Top ShMM
shm2
Bottom ShMM
RTM
Rear Transition Module, back of chassis, directly behind controlling blade
ARTM
Advance Rear Transition Module, back of chassis, directly behind controlling blade
AMC
Advance Mezzanine Card, front slots on blade
   

IPMB Address Table

This table converts Physical Slot numbers to ShMM IPMB Address and SW status LED

Physical Slotshm1shm2Shelf1234567891011121314
Logical Slot
-- -- -- 13 11 9 7 5 3 1 2 4 6 8 10 12 14
SW: Base -- -- -- 0/13 0/11 0/9 0/7 0/5 0/3 -- -- 0/4 0/6 0/8 0/10 0/12 0/14
SW: Extended -- -- -- 0/12 0/10 0/8 0/6 0/4 0/2 -- -- 0/3 0/5 0/7 0/9 0/11 0/13
IPMB Address 10 12 20 9a 96 92 8e 8a 86 82 84 88 8c 90 94 98 9c
HW Address 08 09 10 4d 4b 49 47 45 43 41 42 44 46 48 4a 4c 4e
                                   

Where to obtain Data?

ProblemWhere to look?Physical Location
Chassis ShMM Console SAP Panel
Switch Switch Console Switch Blade Panel
Blade Blade Console Computing Blade Panel

 

Data Collection:

 

 

  • Any Chassis related problem (SAP, FT, PEM, or ShMM): /tmp/debug.log file and various command outputs from the ShMM
  • Switch related: Command output from switch console and network topology information.
  • Blade related - hardware problem: /tmp/debug.log file, (OS data collection script) explorer, sosreport

First thing to try when a problem occurs with a ShMM, Blade or RTM/ARTM:  Re-seat the component or move it (blabe or RTM/ARTM) to a different available slot within a 1 to 3 minute interval.  In the case of the Shmm, failover to other ShMM and retry.


Also customer should provide a full description of  all of the LED activities when problem occurs.  For example:  "As a blade comes up, when power is applied, All LED's light up at least once, then BLUE (hot-swap) LED blinks at a slow rate, and then the Green (OK) LED lights up solid as Blue LED goes off, and then the blade is at OBP"

Notes:
  1. The /tmp/debug.log file output is the equivalent of a SP snapshot for the ShMM. It is generated by executing the /etc/summary script on the active ShMM.
  2. Various commands for the ShMM are listed below in another section
  3. A list of various commands for the Switch blades are also listed below in another section
  4. If the problem can be isolated to the computing blade itself, one should follow any regular UltraSPARC or x64/x86 troubleshooting steps as if it was a standalone system.

In addition to the above outputs/files, it would also be desirable if the customer provided the following information:

  • Is the problem/symptom reproducible?
  •  If so, any instructions from customer?
  • Step-by-step instructions customer has done so far to troubleshoot the problem
  • Any additional Console Logs or outputs that the customer may have that will help to assess the problem/symptom

Various ShMM Commands to help debug an issue:
  • From the active ShMM

      • # clia board | grep -i slot (quick look at which slots are populated)

      • /etc/summary customer will have to upload /tmp/debug.log file

      • ShMM commands – if the customer knows which blade has an issue, run the following commands to get more details about the blade:

        • # clia board (display basic info of all boards installed in chassis)

        • # clia board -v (displays board info plus firmware versions)

        • # clia fru (list all FRU's in the chassis)

        • # clia fruinfo -v board <slot number>

        • # clia board -v <slot number>(reports version of firmware installed on board)

        • # clia getfruledstate <IPMB address> (reports current status of blade LEDs)

          • Blue: Ready to remove

          • Amber: Fault

          • Green: OK

        • # clia showunhealthy

        • # clia sensor board <slot number > | grep Sensor (reports description of all sensors)

        • # clia getthreshhold board <slot number> <sensor number> (reports threshold ranges)

        • # clia sensordata board <slot number> <sensor number> (reports current sensor value)

FRU Information

From the ShMM console:  Use [clia] fruinfo command to obtain part number and serial number of component

        # clia help fruinfo
        Display the FRU Info of the dedicated FRU in the readable format
        instead of <addr> <fru_id> user may use:
        power_supply <N> (valid in 2.x systems only)
        fan_tray <N>
        board <N>
        shm <N>
        to access the FRU on the specified board
        fruinfo board 14
        fruinfo power_supply 4
        fruinfo <add> <fru_id>

Examples:

        # clia fruinfo shm 1
       10: FRU # 0, FRU Info
       Common Header: Format Version = 1
        Internal Use Area:
        Version = 1
        Board Info Area:
        Version = 1
        Language Code = 25
        Mfg Date/Time = Oct 15 13:00:00 2007 (6199980 minutes since 1996)
        Board Manufacturer = Schroff GmbH
        Board Product Name = ACBIV Rad Split USB
        Board Serial Number = 1480701210
        Board Part Number = 21596-247
        FRU Programmer File ID = 21596247ABBIN.bin
        Product Info Area:
        Version = 1
        Language Code = 25
        Manufacturer Name = Sun Microsystems, Inc.
        Product Name = NETRA,CT900,SHELF_MGR
        Product Part / Model# = 371-3037-01
        Product Version = 50
        Product Serial Number = 1005SCH-0748AX1224
        Asset Tag = 0000000000000001
        FRU Programmer File ID = 21596247ABBIN.bin
        Multi Record Area:
        PICMG Board Point-to-Point Connectivity Record (ID=0x14)
        Version = 0
      
       # clia fruinfo 20 3 <---- clia fruinfo fan_tray 1
       20: FRU # 3, FRU Info
       Common Header: Format Version = 1
       Board Info Area:
       Version = 1
       Language Code = 25
       Mfg Date/Time = Dec 21 00:00:00 2005 (5244480 minutes since 1996)
       Board Manufacturer = Schroff
       Board Product Name = Fan Tray Controller
       Board Serial Number = 0000001
       Board Part Number = 23098-533
       FRU Programmer File ID =
       Product Info Area:
       Version = 1
       Language Code = 25
       Manufacturer Name = Schroff
       Product Name = Fan Tray
       Product Part / Model# = 21594-189
       Product Version = Rev. 1.00
       Product Serial Number = 0000001
       Asset Tag =
       FRU Programmer File ID = /var/nvdata/Schroff_21594189_AA.inf

       # clia fruinfo board 3   <
        Pigeon Point Shelf Manager Command Line Interpreter
        92: FRU # 0, FRU Info
        Common Header: Format Version = 1
        Board Info Area:
        Version = 1
        Language Code = 25
        Mfg Date/Time = Nov 14 11:59:00 2002 (3613679 minutes since 1996)
        Board Manufacturer = Sun Microsystems, Inc.
        Board Product Name = Netra CP3060
        Board Serial Number = WJ009D
        Board Part Number = 50176570152
        FRU Programmer File ID = 520-3967.fru-info.inf
        Product Info Area:
        Version = 1
        Language Code = 25
        Manufacturer Name = Sun Microsystems, Inc.
        Product Name = Netra CP3060
        Product Part / Model# = 50176570152 <---- 501-7657-01, REV 52
        Product Version = 2007.05.03.v1.1
        Product Serial Number = WJ009D  <
        Asset Tag =
        FRU Programmer File ID = 520-3967.fru-info.inf
        Multi Record Area:
        PICMG Board Point-to-Point Connectivity Record (ID=0x14)
        Version = 0
        AMC Carrier Information Table Record (ID=0x1a)
        Version = 0
        AMC Carrier Activation and Current Management Record (ID=0x17)
        Version = 0
        AMC Carrier Point-to-Point Connectivity Record (ID=0x18)
        Version = 0
        AMC Point-to-Point Connectivity Record (ID=0x19)
        Version = 0
        AMC Point-to-Point Connectivity Record (ID=0x19)
        Version = 0

LED Status

ShMM Console: Use [clia] getfruledstate   command to obtain LED status of all chassis components

        # clia help getfruledstate
        Returns the state of the FRU's LED(s)
        instead of <addr> <fru_id> user may use:
        board <N>
        shm <N>
        If the <LedId> parameter is not specified, all the LEDs related to the specified FRU
        are queried; Otherwise that specified LED is queried only
        If -v option is specified, additional information about LED(s) properties will
        be printed
        Examples:
               getfruledstate 20 4 1
               getfruledstate 20 4
               getfruledstate 20
        getfruledstate [-v] [<addr> [<fru_id> [<LedId>|ALL]]]

     Note: 20 (0x20) is IPMB Address of chassis itself
     Note: Use [clia] fru 20 to obtain FRU list of chassis


       # clia getfruledstate board 1 <--- For blades: getfruledstate board <slot #>
        9a: FRU # 0, Led # 0 ("BLUE LED"):
        Local Control LED State: LED OFF
        9a: FRU # 0, Led # 1 ("LED 1"):
        Local Control LED State: LED OFF
        9a: FRU # 0, Led # 2 ("LED 2"):
        Local Control LED State: LED ON, color: GREEN

      # clia getfruledstate 20 3 <--- For FT: getfruledstate 20 <3|4|5>
       20: FRU # 3, Led # 0 ("BLUE LED"):
       Local Control LED State: LED OFF
       20: FRU # 3, Led # 1 ("LED 1"):
       Local Control LED State: LED OFF
       20: FRU # 3, Led # 2 ("LED 2"):
       Local Control LED State: LED ON, color: GREEN

Sensor Information (Sensor Number, Voltage/Temperature Reading, etc.)

ShMM Console: Use [clia] sensor command to obtain the list of sensors of a particular
chassis component. Please be careful that each R version has its own sensor # to sensor
name association
, the following example sensor lists (using R3U2-RR) are not fixed. If
Customer uses sensor # in their script writing, the recommendation is to change that to
sensor name instead.

       # clia help sensor
       Shows sensor information
       instead of <addr> user may use:
       board <N>
       shm <N>
       to access the sensor on the specified board
       sensor board 21 "IPMB LINK"
       sensor 20 8
       sensor [ <addr> [ [ lun: ]<sensor id> | <sensor name> ] ]


Example list of chassis sensors:

# clia sensor 20 | grep Sensor <---- 20 (0x20) is the IPMB Address of chassis itself
20: LUN: 0, Sensor # 0 ("FRU 0 HOT_SWAP")
20: LUN: 0, Sensor # 2 ("FRU 1 HOT_SWAP")
20: LUN: 0, Sensor # 3 ("FRU 2 HOT_SWAP")
20: LUN: 0, Sensor # 4 ("FRU 8 HOT_SWAP")
20: LUN: 0, Sensor # 5 ("FRU 3 HOT_SWAP")
20: LUN: 0, Sensor # 6 ("FRU 4 HOT_SWAP")
20: LUN: 0, Sensor # 7 ("FRU 5 HOT_SWAP")
20: LUN: 0, Sensor # 8 ("FRU 6 HOT_SWAP")
20: LUN: 0, Sensor # 9 ("FRU 7 HOT_SWAP")
20: LUN: 0, Sensor # 10 ("IPMB LINK 1")
20: LUN: 0, Sensor # 11 ("IPMB LINK 2")
20: LUN: 0, Sensor # 12 ("Fan Tray 0")
20: LUN: 0, Sensor # 13 ("Fan Tray 1")
20: LUN: 0, Sensor # 14 ("Fan Tray 2")
20: LUN: 0, Sensor # 15 ("IPMB LINK 3")
20: LUN: 0, Sensor # 16 ("IPMB LINK 4")
20: LUN: 0, Sensor # 17 ("IPMB LINK 5")
20: LUN: 0, Sensor # 18 ("IPMB LINK 6")
20: LUN: 0, Sensor # 19 ("IPMB LINK 7")
20: LUN: 0, Sensor # 20 ("IPMB LINK 8")
20: LUN: 0, Sensor # 21 ("IPMB LINK 9")
20: LUN: 0, Sensor # 22 ("IPMB LINK 10")
20: LUN: 0, Sensor # 23 ("IPMB LINK 11")
20: LUN: 0, Sensor # 24 ("IPMB LINK 12")
20: LUN: 0, Sensor # 25 ("IPMB LINK 13")
20: LUN: 0, Sensor # 26 ("IPMB LINK 14")
20: LUN: 0, Sensor # 27 ("IPMB LINK 15")
20: LUN: 0, Sensor # 120 ("Center Exhaust")
20: LUN: 0, Sensor # 121 ("Left Exhaust")
20: LUN: 0, Sensor # 122 ("Right Exhaust")
20: LUN: 0, Sensor # 123 ("SAP Temp")
20: LUN: 0, Sensor # 124 ("Temp_In Left")
20: LUN: 0, Sensor # 125 ("Temp_In Center")
20: LUN: 0, Sensor # 126 ("Temp_In Right")
20: LUN: 0, Sensor # 131 ("TELCO Alarms")
20: LUN: 0, Sensor # 132 ("BMC Watchdog")
20: LUN: 0, Sensor # 133 ("SYSTEM EVENT")
20: LUN: 0, Sensor # 150 ("Air Filter")
20: LUN: 0, Sensor # 152 ("SAP")
20: LUN: 0, Sensor # 162 ("PEM A In 2")
20: LUN: 0, Sensor # 163 ("PEM A In 2 Fused")
20: LUN: 0, Sensor # 164 ("PEM A In 1")
20: LUN: 0, Sensor # 165 ("PEM A In 1 Fused")
20: LUN: 0, Sensor # 166 ("PEM A In 4")
20: LUN: 0, Sensor # 167 ("PEM A In 4 Fused")
20: LUN: 0, Sensor # 168 ("PEM A In 3")
20: LUN: 0, Sensor # 169 ("PEM A In 3 Fused")
20: LUN: 0, Sensor # 174 ("PEM B In 2")
20: LUN: 0, Sensor # 175 ("PEM B In 2 Fused")
20: LUN: 0, Sensor # 176 ("PEM B In 1")
20: LUN: 0, Sensor # 177 ("PEM B In 1 Fused")
20: LUN: 0, Sensor # 178 ("PEM B In 4")
20: LUN: 0, Sensor # 179 ("PEM B In 4 Fused")
20: LUN: 0, Sensor # 180 ("PEM B In 3")
20: LUN: 0, Sensor # 181 ("PEM B In 3 Fused")
20: LUN: 0, Sensor # 192 ("PEM A")
20: LUN: 0, Sensor # 193 ("PEM B")
20: LUN: 0, Sensor # 194 ("Shelf EEPROM 1")
20: LUN: 0, Sensor # 195 ("Shelf EEPROM 2")
20: LUN: 0, Sensor # 200 ("PEM A Temp")
20: LUN: 0, Sensor # 201 ("PEM B Temp")
20: LUN: 0, Sensor # 208 ("24V FT 0")
20: LUN: 0, Sensor # 209 ("-48A bus FT 0")
20: LUN: 0, Sensor # 210 ("-48A FT 0")
20: LUN: 0, Sensor # 211 ("-48B bus FT 0")
20: LUN: 0, Sensor # 212 ("-48B FT 0")
20: LUN: 0, Sensor # 213 ("-48A FT 0 Fuse")
20: LUN: 0, Sensor # 214 ("-48B FT 0 Fuse")
20: LUN: 0, Sensor # 215 ("24V FT 1")
20: LUN: 0, Sensor # 216 ("-48A bus FT 1")
20: LUN: 0, Sensor # 217 ("-48A FT 1")
20: LUN: 0, Sensor # 218 ("-48B bus FT 1")
20: LUN: 0, Sensor # 219 ("-48B FT 1")
20: LUN: 0, Sensor # 220 ("-48A FT 1 Fuse")
20: LUN: 0, Sensor # 221 ("-48B FT 1 Fuse")
20: LUN: 0, Sensor # 222 ("24V FT 2")
20: LUN: 0, Sensor # 223 ("-48A bus FT 2")
20: LUN: 0, Sensor # 224 ("-48A FT 2")
20: LUN: 0, Sensor # 225 ("-48B bus FT 2")
20: LUN: 0, Sensor # 226 ("-48B FT 2")
20: LUN: 0, Sensor # 227 ("-48A FT 2 Fuse")
20: LUN: 0, Sensor # 228 ("-48B FT 2 Fuse")
20: LUN: 0, Sensor # 244 ("3V3_RAD")

Example list of ShMM sensors:

# clia sensor shm 1 | grep Sensor
10: LUN: 0, Sensor # 0 ("FRU 0 HOT_SWAP")
10: LUN: 0, Sensor # 1 ("IPMB LINK")
10: LUN: 0, Sensor # 2 ("Local Temp")
10: LUN: 0, Sensor # 3 ("3V3_local")
10: LUN: 0, Sensor # 4 ("I2C_PWR_A")
10: LUN: 0, Sensor # 5 ("I2C_PWR_B")
10: LUN: 0, Sensor # 6 ("VBAT")
10: LUN: 0, Sensor # 7 ("Fan Tach. 0")
10: LUN: 0, Sensor # 8 ("Fan Tach. 1")
10: LUN: 0, Sensor # 10 ("Fan Tach. 2")
10: LUN: 0, Sensor # 11 ("Fan Tach. 3")
10: LUN: 0, Sensor # 13 ("Fan Tach. 4")
10: LUN: 0, Sensor # 14 ("Fan Tach. 5")
10: LUN: 0, Sensor # 15 ("-48A Bus voltage")
10: LUN: 0, Sensor # 16 ("-48B Bus voltage")
10: LUN: 0, Sensor # 17 ("-48A ACB voltage")
10: LUN: 0, Sensor # 18 ("-48B ACB voltage")
10: LUN: 0, Sensor # 19 ("-48A ACB Fuse")
10: LUN: 0, Sensor # 20 ("-48B ACB Fuse")
10: LUN: 0, Sensor # 128 ("CPLD State")


Example list of PEM sensors:

# clia sensor 20 | grep PEM
20: LUN: 0, Sensor # 162 ("PEM A In 2")
20: LUN: 0, Sensor # 163 ("PEM A In 2 Fused")
20: LUN: 0, Sensor # 164 ("PEM A In 1")
20: LUN: 0, Sensor # 165 ("PEM A In 1 Fused")
20: LUN: 0, Sensor # 166 ("PEM A In 4")
20: LUN: 0, Sensor # 167 ("PEM A In 4 Fused")
20: LUN: 0, Sensor # 168 ("PEM A In 3")
20: LUN: 0, Sensor # 169 ("PEM A In 3 Fused")
20: LUN: 0, Sensor # 174 ("PEM B In 2")
20: LUN: 0, Sensor # 175 ("PEM B In 2 Fused")
20: LUN: 0, Sensor # 176 ("PEM B In 1")
20: LUN: 0, Sensor # 177 ("PEM B In 1 Fused")
20: LUN: 0, Sensor # 178 ("PEM B In 4")
20: LUN: 0, Sensor # 179 ("PEM B In 4 Fused")
20: LUN: 0, Sensor # 180 ("PEM B In 3")
20: LUN: 0, Sensor # 181 ("PEM B In 3 Fused")
20: LUN: 0, Sensor # 192 ("PEM A")
20: LUN: 0, Sensor # 193 ("PEM B")
20: LUN: 0, Sensor # 200 ("PEM A Temp")
20: LUN: 0, Sensor # 201 ("PEM B Temp")


Example list of Switch (CP3140) sensors:

# clia sensor board 7 | grep Sensor <---- Switch blades are in slots 7 & 8
82: LUN: 0, Sensor # 0 ("FRU 0 HOT_SWAP")
82: LUN: 0, Sensor # 1 ("IPMB LINK")
82: LUN: 0, Sensor # 2 ("-48V ALARM")
82: LUN: 0, Sensor # 3 ("RTM Present")
82: LUN: 0, Sensor # 4 ("OOS LED")
82: LUN: 0, Sensor # 5 ("ACTIVE LED")
82: LUN: 0, Sensor # 6 ("5V")
82: LUN: 0, Sensor # 7 ("3.3V")
82: LUN: 0, Sensor # 8 ("2.5V")
82: LUN: 0, Sensor # 9 ("1.5V")
82: LUN: 0, Sensor # 10 ("1.25V")
82: LUN: 0, Sensor # 11 ("Board Temp1")
82: LUN: 0, Sensor # 12 ("Board Temp2")
82: LUN: 0, Sensor # 13 ("BMC Watchdog")


Example list of Switch (CP3240) sensors:

# clia sensor board 7 | grep Sensor <---- Switch blade is location at slot 7 & 8
82: LUN: 0, Sensor # 0 ("Hot Swap")
82: LUN: 0, Sensor # 2 ("Hot Swap AMC #1")
82: LUN: 0, Sensor # 3 ("Hot Swap AMC #2")
82: LUN: 0, Sensor # 4 ("Hot Swap AMC #3")
82: LUN: 0, Sensor # 5 ("+12.0V")
82: LUN: 0, Sensor # 6 ("+3.3V")
82: LUN: 0, Sensor # 7 ("+2.5V")
82: LUN: 0, Sensor # 8 ("+1.25V")
82: LUN: 0, Sensor # 9 ("IPMB Physical")
82: LUN: 0, Sensor # 10 ("Base CPU Temp")
82: LUN: 0, Sensor # 12 ("RTM Presence")
82: LUN: 0, Sensor # 13 ("Base Early")
82: LUN: 0, Sensor # 14 ("Base Full")
82: LUN: 0, Sensor # 15 ("Base Good")
82: LUN: 0, Sensor # 16 ("Fabric Early")
82: LUN: 0, Sensor # 17 ("Fabric Full")
82: LUN: 0, Sensor # 18 ("Fabric Good")
82: LUN: 0, Sensor # 19 ("BMC Watchdog")
82: LUN: 0, Sensor # 20 ("Fabric CPU Temp")
82: LUN: 0, Sensor # 21 ("+1.5V")
82: LUN: 0, Sensor # 22 ("+1.8V")
82: LUN: 0, Sensor # 23 ("+1.0V")
82: LUN: 0, Sensor # 24 ("+1.2V")
82: LUN: 0, Sensor # 25 ("Site 1 PWR cur")
82: LUN: 0, Sensor # 26 ("Site 1 PWR")
82: LUN: 0, Sensor # 27 ("Site 1 MP")
82: LUN: 0, Sensor # 28 ("Site 2 PWR cur")
82: LUN: 0, Sensor # 29 ("Site 2 PWR")
82: LUN: 0, Sensor # 30 ("Site 2 MP")
82: LUN: 0, Sensor # 31 ("Site 3 PWR cur")
82: LUN: 0, Sensor # 32 ("Site 3 PWR")
82: LUN: 0, Sensor # 33 ("Site 3 MP")
82: LUN: 0, Sensor # 34 ("+3.3V STBY")
82: LUN: 0, Sensor # 35 ("+12V")
82: LUN: 0, Sensor # 36 ("DS75 Temp")
82: LUN: 0, Sensor # 37 ("AD7417 Temp")
82: LUN: 0, Sensor # 38 ("+1.2V")
82: LUN: 0, Sensor # 39 ("+1.8V")
82: LUN: 0, Sensor # 40 ("+3.3V")
82: LUN: 0, Sensor # 41 ("+5V")
82: LUN: 0, Sensor # 42 ("+3.3V STBY")
82: LUN: 0, Sensor # 43 ("+12V")
82: LUN: 0, Sensor # 44 ("DS75 Temp")
82: LUN: 0, Sensor # 45 ("AD7417 Temp")
82: LUN: 0, Sensor # 46 ("+1.2V")
82: LUN: 0, Sensor # 47 ("+1.8V")
82: LUN: 0, Sensor # 48 ("+3.3V")
82: LUN: 0, Sensor # 49 ("+5V")
82: LUN: 0, Sensor # 50 ("+3.3V STBY")
82: LUN: 0, Sensor # 51 ("+12V")
82: LUN: 0, Sensor # 52 ("DS75 Temp")
82: LUN: 0, Sensor # 53 ("AD7417 Temp")
82: LUN: 0, Sensor # 54 ("+1.2V")
82: LUN: 0, Sensor # 55 ("+2.5V")
82: LUN: 0, Sensor # 56 ("+3.3V")


NOTE: Use [clia] board <slot #> to identify the blade in that slot

Example of CP3060 Sensors:

# clia sensor board 3 | grep Sensor
92: LUN: 0, Sensor # 0 ("FRU 0 Hot Swap")
92: LUN: 0, Sensor # 1 ("RTM Hot Swap")
92: LUN: 0, Sensor # 2 ("HotSwap AMC 0")
92: LUN: 0, Sensor # 3 ("IPMB Physical")
92: LUN: 0, Sensor # 4 ("BMC Watchdog")
92: LUN: 0, Sensor # 5 ("CPU Temp1")
92: LUN: 0, Sensor # 6 ("CPU Temp2")
92: LUN: 0, Sensor # 7 ("Board Temp")
92: LUN: 0, Sensor # 8 ("12.0V")
92: LUN: 0, Sensor # 9 ("5.0V")
92: LUN: 0, Sensor # 10 ("3.3V")
92: LUN: 0, Sensor # 11 ("3.3V STBY")
92: LUN: 0, Sensor # 12 ("2.5V STBY")
92: LUN: 0, Sensor # 13 ("1.0V")
92: LUN: 0, Sensor # 14 ("1.2V CPU")
92: LUN: 0, Sensor # 15 ("1.2V")
92: LUN: 0, Sensor # 16 ("1.5V")
92: LUN: 0, Sensor # 17 ("0.9V VTTL")
92: LUN: 0, Sensor # 18 ("0.9V VTTR")
92: LUN: 0, Sensor # 19 ("1.8V DDR2L")
92: LUN: 0, Sensor # 20 ("1.8V DDR2R")
92: LUN: 0, Sensor # 21 ("2.5V")
92: LUN: 0, Sensor # 22 ("1.2V STBY")
92: LUN: 0, Sensor # 23 ("AMC 12V")
92: LUN: 0, Sensor # 24 ("AMC 3.3V")
92: LUN: 0, Sensor # 25 ("RTM Presence")
92: LUN: 0, Sensor # 26 ("Version change")
92: LUN: 0, Sensor # 27 ("+3.3V")
92: LUN: 0, Sensor # 28 ("+5V")
92: LUN: 0, Sensor # 29 ("+12V")
92: LUN: 0, Sensor # 30 ("LM60 Temp")
92: LUN: 0, Sensor # 31 ("DS75 Temp")
92: LUN: 0, Sensor # 32 ("BMC Watchdog")

Example of CP3250 Sensors:

# clia sensor board 6 | grep Sensor
86: LUN: 0, Sensor # 0 ("FRU 0 Hot Swap")
86: LUN: 0, Sensor # 3 ("IPMB Physical")
86: LUN: 0, Sensor # 4 ("BMC Watchdog")
86: LUN: 0, Sensor # 5 ("12.0V")
86: LUN: 0, Sensor # 6 ("5.0V")
86: LUN: 0, Sensor # 7 ("3.3V")
86: LUN: 0, Sensor # 8 ("3.3V STBY")
86: LUN: 0, Sensor # 9 ("SuperCAP voltage")
86: LUN: 0, Sensor # 10 ("1.2V NTune")
86: LUN: 0, Sensor # 11 ("CPU VTT")
86: LUN: 0, Sensor # 12 ("1.5 V")
86: LUN: 0, Sensor # 13 ("1.8 V")
86: LUN: 0, Sensor # 14 ("DDR2 VTT")
86: LUN: 0, Sensor # 15 ("1.05 V Core")
86: LUN: 0, Sensor # 16 ("1.5 V NTune")
86: LUN: 0, Sensor # 17 ("VCC CPU1")
86: LUN: 0, Sensor # 18 ("VCC CPU0")
86: LUN: 0, Sensor # 19 ("Inlet 1 Temp Sen")
86: LUN: 0, Sensor # 20 ("Inlet 3 Temp Sen")
86: LUN: 0, Sensor # 21 ("Inlet 2 Temp Sen")
86: LUN: 0, Sensor # 22 ("MCH Temp Sensor")
86: LUN: 0, Sensor # 23 ("CPU_TEMP_SK0D0")
86: LUN: 0, Sensor # 24 ("CPU_TEMP_SK0D1")
86: LUN: 0, Sensor # 25 ("CPU_TEMP_SK1DO")
86: LUN: 0, Sensor # 26 ("CPU_TEMP_SK1D1")
86: LUN: 0, Sensor # 27 ("Version change")
86: LUN: 0, Sensor # 28 ("System Event")
86: LUN: 0, Sensor # 29 ("CPU 0 presence")
86: LUN: 0, Sensor # 30 ("CPU 1 presence")
86: LUN: 0, Sensor # 31 ("P48V Alarm")
86: LUN: 0, Sensor # 32 ("Sys fw progress")
86: LUN: 0, Sensor # 33 ("Graceful reboot")

# clia sensor board 4 | grep Sensor (this is for a CP3270)

8e: LUN: 0, Sensor # 0 ("FRU 0 Hot Swap")
8e: LUN: 0, Sensor # 3 ("IPMB Physical")
8e: LUN: 0, Sensor # 4 ("BMC Watchdog")
8e: LUN: 0, Sensor # 5 ("CPU0 Temp")
8e: LUN: 0, Sensor # 6 ("CPU1 Temp")
8e: LUN: 0, Sensor # 7 ("Vbat")
8e: LUN: 0, Sensor # 8 ("3.3V STBY")
8e: LUN: 0, Sensor # 9 ("12.0 V")
8e: LUN: 0, Sensor # 10 ("5.0 V")
8e: LUN: 0, Sensor # 11 ("3.3 V")
8e: LUN: 0, Sensor # 12 ("PCH 1.05 V")
8e: LUN: 0, Sensor # 13 ("CPU0 DDR3 1.5V")
8e: LUN: 0, Sensor # 14 ("CPU1 DDR3 1.5V")
8e: LUN: 0, Sensor # 15 ("CPU0 DDR3 0.75V ")
8e: LUN: 0, Sensor # 16 ("CPU1 DDR3 0.75V")
8e: LUN: 0, Sensor # 17 ("CPU0 VTT")
8e: LUN: 0, Sensor # 18 ("CPU1 VTT")
8e: LUN: 0, Sensor # 19 ("CPU0 VCCP")
8e: LUN: 0, Sensor # 20 ("CPU1 VCCP")
8e: LUN: 0, Sensor # 21 ("Version Change")
8e: LUN: 0, Sensor # 22 ("System Event")
8e: LUN: 0, Sensor # 23 ("CPU 0 presence")
8e: LUN: 0, Sensor # 24 ("CPU 1 presence")
8e: LUN: 0, Sensor # 25 ("P48V Alarm")
8e: LUN: 0, Sensor # 26 ("Sys fw progress")
8e: LUN: 0, Sensor # 27 ("Graceful reboot")
8e: LUN: 0, Sensor # 28 ("Thermal Trip")


Once the sensor is identified, use clia sensordata command to obtain the specific
info of the particular sensor.

       # clia help sensordata
       Shows sensor data
       instead of <addr> user may use:
       board <N>
       shm <N>
       to access the sensor on the specified board
       (only sensors with thresholds crossed if -t is given)
       sensordata board 21 "IPMB LINK"
       sensordata 20 8
       sensordata [-t] [ <addr> [ [ lun: ]<sensor id> | <sensor name> ] ]
   
If need to know the threshold of each sensor, uses [clia] getthreshold command to
check, and [clia] setthreshold command to set the threshold of each sensor.

       # clia help getthreshold
       Shows the threshold of the specified sensor
       instead of <addr> user may use:
       board <N>
       shm <N>
       to access the sensor on the specified board
      getthreshold board 21 "IPMB LINK"
      getthreshold 20 8
      getthreshold [ <addr> [ [ lun: ]<sensor id> | <sensor name> ] ]

      # clia help setthreshold
      Set the specified threshold of the dedicated sensor
      unc - Upper Non Critical
      uc - Upper Critical
      unr - Upper Non Recoverable
      lnc - Lower Non Critical
      lc - Lower Critical
      lnr - Lower Non Recoverable
      instead of <addr> user may use:
      board <N>
      shm <N>
      to access the sensor on the specified board
      "-r <value>" considers <value> as unsigned byte
      just "<value>" considers as the floating point number
      setthreshold board 21 "IPMB LINK" unc -r 34
      setthreshold 20 8 lc -45.67
      setthreshold <addr> [ lun: ]<sensor_id> | <sensor name> unc | uc | unr | lnc | lc | lnr
      [-r] value

This is how the threshold levels are defined:

IPMI LevelPICMG 3.0TelcoMeaning
Non-Critical Minor Minor Sensor out of normal range, but not yet a problem (Warning)
Critical Major Major Sensor well out of normal range, but still within vendor operating tolerances
Non-Recoverable Critical Critical Sensor out of vendor operating tolerance range, equipment may be damaged

Temperature Sensor Example:

       # clia sensordata board 3 5
       92: LUN: 0, Sensor # 5 ("CPU Temp1")
       Type: Threshold (0x01), "Temperature" (0x01)
       Status: 0xc0
       All event messages enabled from this sensor
       Sensor scanning enabled
       Initial update completed
       Raw data: 54 (0x36)
       Processed data: 54.000000 degrees C <---- This is the read of sensor
       Status: 0x00

      # clia getthreshold board 3 5
      92: LUN: 0, Sensor # 5 ("CPU Temp1")
      Type: Threshold (0x01), "Temperature" (0x01)
      Upper Non-Critical Threshold, Raw Data: 0x50 Processed data: 80.000000 degrees C
      Upper Critical Threshold, Raw Data: 0x5a Processed data: 90.000000 degrees C
      Upper Non-Recoverable Threshold, Raw Data: 0x66 Processed data: 102.000000 degrees C


Voltage Example:

       # clia sensordata board 3 14
       92: LUN: 0, Sensor # 14 ("1.2V CPU")
       Type: Threshold (0x01), "Voltage" (0x02)
       Status: 0xc0
       All event messages enabled from this sensor
       Sensor scanning enabled
       Initial update completed
       Raw data: 124 (0x7c)
       Processed data: 1.215200 Volts <---- This is the reading of sensor
       Status: 0x00

      # clia getthreshold board 3 14
       92: LUN: 0, Sensor # 14 ("1.2V CPU")
       Type: Threshold (0x01), "Voltage" (0x02)
       Lower Non-Critical Threshold, Raw Data: 0x75 Processed data: 1.146600 Volts
       Lower Critical Threshold, Raw Data: 0x72 Processed data: 1.117200 Volts
       Lower Non-Recoverable Threshold, Raw Data: 0x6e Processed data: 1.078000 Volts
       Upper Non-Critical Threshold, Raw Data: 0x81 Processed data: 1.264200 Volts
       Upper Critical Threshold, Raw Data: 0x84 Processed data: 1.293600 Volts
       Upper Non-Recoverable Threshold, Raw Data: 0x88 Processed data: 1.332800 Volts

Component Absent/Present Example:

      # clia sensor 20 | grep Air
      20: LUN: 0, Sensor # 150 ("Air Filter")

      # clia sensordata 20 150
      20: LUN: 0, Sensor # 150 ("Air Filter")
      Type: Discrete (0x6f), "Entity Presence" (0x25)
     Status: 0xc0
     All event messages enabled from this sensor
     Sensor scanning enabled
     Initial update completed
     Sensor reading: 0x00
     Current State Mask 0x0001
     Entity Present <---- The Air Filter is in place

Switch Commands:

Switch Console: uses the following commands to report the configuration of the switch.  If additional information is needed, please refer to the Switch manual for more show command details

show running-configreport current switch configuration
show hardware
switch firmware version
show network
outbound setting of NET MGT port
show serviceport
inbound setting of NET MGT port
show mac-addr-table
MAC address table learned from all ports
show port all
status of all ports - physical and logical
show interface
brief statistics of a particular interface
show interface ethernet
detailed statistics of a particular interface
show vlan all
list of configured VLAN
ping
ping chassis internal or external component/systems

FASTPATH is a mode-based command line interface. The commands in one mode are not available until the operator switches to that particular mode. Enter ? at CLI prompt display a list of available commands and descriptions. Enter TAB at CLI prompt will complete the command for you unless the command is not yet unique at that point. Exit the current level using exit command. There are many different modes available see the chart at the end of the document to determine which mode may be needed.

FASTPATH supports multiple users with different security levels. By default, there is one admin user with no password. In the CLI, privilege mode is password-protected separately from the default mode, but also has a default of no password.

Below are the most often used modes.

  • User Exec Mode

    • Initial mode when user first logs in to a switch

    • Use username “admin” to enter:

      User: admin <cr>

      Password: <Default: No password> <cr> <cr>

      (ATS1160 Base) >

  • Privileged Exec Mode

    • switch to this mode from admin mode

      • Type exit or press Ctrl-Z to exit to the previous mode

        (ATS1160 Base) > enable

        password: <password > [default = admin]

        (ATS1160 Base) (Config) #

    • Global Configure Mode

      • Switch to this mode from admin mode

      • Type exit or press Ctrl-Z to exit to the previous mode

      • Allows user to configure switch parameters

        (ATS1160 Base) # configure

        (ATS1160 Base) (Config) #


There are more arguments/options for the show command, please use show ? to view the
entire list and brief explanation.

CP3240 Examples (CP3140 output is different):

(CP3240H-BEX-Z Base) # show hardware
Switch: 1
System Description   .............................      CP3240H-BEX-Z Base
Machine Type       ...................................      CP3240H-BEX-Z
Machine Model      ..................................      CP3240H-BEX-Z
Serial Number       ..................................      1544DTI-0742330074
FRU Number      .....................................       375-3523-01
Part Number        ....................................       375-3523-01
Maintenance Level    ..............................      A
Manufacturer         ..................................       0x34b7
Burned In MAC Address  ..........................   00:20:13:F1:0E:6D
Software Version     ...............................       1.3.3.0
Operating System     ...............................      Linux 2.4.20_mvl31
Network Processing Device ......................  BCM56504 REV 1
Additional Packages    ............................     FASTPATH QOS
FASTPATH Multicast
FASTPATH IPv6

(CP3240H-BEX-Z Base) # show port all
                 Admin   Physical  Physical    Link        Link       LACP      Actor
Intf  Type  Mode      Mode      Status    Status      Trap       Mode    Timeout
------ ------  -------     ----------    ----------     ------         -------      ------        --------
0/1            Enable   Auto      100 Full     U-Up       Enable   Enable     short
0/2            Enable   Auto                         D-Down  Enable   Enable    short
0/3            Enable   Auto      1000 Full   U-Up       Enable   Enable    short
0/4            Enable   Auto                         U-Down  Enable   Enable    short
0/5            Enable   Auto                         D-Down  Enable   Enable    short
0/6            Enable   Auto                         D-Down  Enable   Enable    short
0/7            Enable   Auto                         D-Down  Enable   Enable    short
0/8            Enable   Auto      100 Full     U-Up        Enable   Enable    short
0/9            Enable   Auto      1000 Full   U-Up        Enable   Enable    short
0/10          Enable   Auto      100 Full     U-Up        Enable   Enable    short
0/11          Enable   Auto      1000 Full   U-Up        Enable   Enable   short
0/12          Enable   Auto      1000 Full   U-Up        Enable   Enable   short
0/13          Enable   Auto      1000 Full   U-Up        Enable   Enable   short
0/14          Enable   Auto      1000 Full   U-Up        Enable   Enable   short
0/15          Enable   Auto                         D-Down   Enable   Enable   short
0/16          Enable   Auto                         D-Down   Enable   Enable   short
0/17          Enable   Auto                         Down       Enable   Enable   short
0/18          Enable   Auto      1000 Full   Up            Enable   Enable   short
0/19          Enable   Auto                         Down       Enable   Enable   short
0/20          Enable   Auto                         Down       Enable   Enable   short
0/21          Enable   Auto                         Down       Enable   Enable   short
0/22          Enable   Auto                         Down       Enable   Enable   short
0/23          Enable   Auto                         Down       Enable   Enable   short
0/24          Enable   Auto                         Down       Enable   Enable   short
0/25          Enable   Auto                         Down       Enable   Enable   short
0/26          Enable   10G Full                   Down       Enable   Enable   short
0/27          Enable   10G Full                   Down       Enable   Enable   short

(CP3240H-BEX-Z Base) # show interface 0/18
Packets Received Without Error    .................   74945744
Packets Received With Error    ....................     0
Broadcast Packets Received   .....................     72354102
Packets Transmitted Without Errors.............     339215
Transmit Packet Errors         .........................     0
Collision Frames             ...............................      0
Time Since Counters Last Cleared  ...............  12 day 21 hr 34 min 57 sec

(CP3240H-BEX-Z Base) # ping 10.5.56.1
Send count=3, Receive count=3 from 10.5.56.1

Troubleshooting

Tips:

ShMM Console is not responding
  • Symptom: ShMM console is connected via terminal concentrator/server, and not responding to key stroke; this sometimes happens when baud rate is 115200.
  • Possible Cause: configuration
  • Solution: Reset the baud rate back to 9600, and set it back to 115200

ShMM is in a booting cycle
  • Symptom: One of ShMM is in a booting cycle, it comes up to log-in prompt without any problem, but soon reboot itself.
  • Possible Causes:
  1. ShMM firmware are not matched up: If lower version one is Active or being shm1
  2. ShMM is not set to use Sun environment
  3. Power Problem
  • Solutions:
  1. Upgrade ShMM firmware to the same version
  2. Check rc2 U-Boot variable (use [clia] getenv rc2 command) for "rc2=/etc/rc.acb3"
  3. See “Power related problem” below.
NetConsole related problems
  • Symptom: NetConsole hangs, or give “cannot connect” type of output
  • Possible Causes:
  1. Incorrect ShMM Setup
  2. Incorrect Switch Setup
  3. Incorrect Blade Setup
  • Solutions:
  1. Check /etc/openhpi.conf, make sure addr field is set to RMCP address, also
    double check the VLAN address is fixed: 192.168.13.109 and depending how ShMM network interfaces are set there might be a need to look into rc_ifconfig and related setting
  2. Make sure VLAN 55 is up and running on the BASE fabric of switch blade
  3. There might be a need to set the ALOM/ILOM IP to corresponding IP based on ShMM
    /var/netcons.ip file
 FT at full speed without chassis temperature event
  • Symptom: Fan Trays run at full speed (Level 15) without thermal event from following sensors: (0x20,124,0) (0x20,125,0) (0x20,126,0) (0x20,200,0) (0x20,201,0)
  • Possible Cause: Power problem or dirty air filter
  • Solution: Check “Power related problem” below or clean up air filter
Power related problem
  • Symptom: Chassis is at an unstable condition (symptoms may include: A. Components [blade, switch, or ShMM] are not power up at all; B. Components run at a booting cycle; etc) and fan trays are running at full speed (Level 15).
  • Possible Cause:
  1. Bad PEM
  2. Not enough power from DC source
  3. Incorrect grade of DC wire
  • Solution: Look into all PEM related sensors to check whether component/connection is present or absent; and PICMG Shelf Power Distribution Record section (or use command [clia] shelf pd) to see if all problematic components are fed by the same power Feed.  Customer needs to check DC source and if DC wire is at correct grade.
Switch is not working as expected
  • Symptom: Customer set up switch blades, but it is not working the way expected (such as VLAN or STP problem)
  • Possible Cause:
  1. VLAN 1 is missing
  2. Switch setting problem
  3. Bad switch (CP3x40)
  4. Problem with external switch
  • Solution: Do the following:
  1. Re-seat or reboot (command is reload) switch blade
  2. Make sure VLAN 1 is in place and the interconnect between two switch blades (0/2 for BASE and 0/1 FABRIC) is NOT included in any VLAN
  3. Collect network topology related information
  4. Ask Customer to look into the external switch as well
  5. Send a collaboration task to Network team

/tmp/debug.log file
  • The ShMM /tmp/debug.log file is equivalent of a SP snapshot or a server Explorer 
  • Each session is separated by > <Some Text> as a header
  • Generated by executing /etc/summary script
M States:

M0 : FRU not installed
M1 : FRU is Inactive
M2 : FRU activation request
M3 : FRU activation in progress
M4 : FRU active
M5 : FRU deactivation request
M6 : FRU deactivation in progress
M7 : Communication Lost - abnormal state ShMM unablle to communicate with the FRU


Cause Table:

0x0: Normal State change
0x1: Change commanded by shelf manager
0x2: State change due to operator changing handle switch (latch/delatch)
0x3: State change due to programmatic action
0x4: Communication Lost or Regained
0x5: Communication Lost or Regained - locally detected:
0x6: Surprise State change due to extraction
0x7: State Change Due to provide information (valid for M7 to M0 transition)
0x8: Invalid Hardware Address detected
0x9: Unexpected Deactivation (Valid for M4 to M6 transition)
0xA: Surprise State change due to power failure
0xF: State Change - Cause Unknown - No cause could be determined

>>
The /tmp/debug.log file contains booting information (/var/log/messages file), and
because ShMM file system is set up on a Flash, booting information is erased with each
reboot. Therefore, if the problem is related to ShMM booting, please ask Customer to
generate a fresh /tmp/debug.log file.

Here is a Q&A session of how to read the /tmp/debug.log file.

Q: How to find ShMM Firmware version?
A: Find the >>>Shelfman Version Session;
   >>> Shelfman version
    Pigeon Point Shelf Manager ver. 2.4.9-R3U2-RR
    Pigeon Point is a trademark of Pigeon Point Systems.
    Copyright (c) 2002-2007 Pigeon Point Systems
     All rights reserved
     Build date/time: Mar 27 2009 08:33:42
     Carrier: ACB; Subtype: 3; Subversion: 1

Q: How to identify if is from shm1 or shm2?
A: Find the >>>ShMC IPMB Address Session;
   >>> ShMC IPMB Address
    Local IPMB Address = 0x10
    0x10 is shm1 (upper), and 0x12 is shm2 (bottom).

Q: How to identify the blades installed in chassis?
A: Check the >>>Board Information Session;
   >>> Board Information
    Physical Slot # 1
    9a: Entity: (0xa0, 0x60) Maximum FRU device ID: 0x01
     PICMG Version 2.2
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     9a: FRU # 0
     Entity: (0xa0, 0x60)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     Device ID String: "Netra CP3260"
     9a: FRU # 1
     Entity: (0xc1, 0x6f)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device ID String: "CP32X0-RTM-HDD"
    ... Physical Slot # 7
    82: Entity: (0xa0, 0x60) Maximum FRU device ID: 0x04
    PICMG Version 2.2
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    82: FRU # 0
    Entity: (0xa0, 0x60)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device ID String: "CP3240H-BEX-Z"
    82: FRU # 2
    Entity: (0xc1, 0x65)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device ID String: "AMC-XFP"
    82: FRU # 3
    Entity: (0xc1, 0x66)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device ID String: "AMC-XFP"
    82: FRU # 4
    Entity: (0xc1, 0x67)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device ID String: "AMC10G-CX4"
    ... Physical Slot # 14
    9c: Entity: (0xa0, 0x60) Maximum FRU device ID: 0x02
    PICMG Version 2.2
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    9c: FRU # 0
    Entity: (0xa0, 0x60)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
     Device ID String: "NetraCP-3250"

     Q: How to identify chassis components?
     A: Check the first sector (start with 20:) of the >>>Detailed FRU Information Session;
     >>> Detailed FRU Information
     10: FRU # 0
      Entity: (0xf0, 0x60)
      Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
      Change Cause: Normal State Change (0x0)
      Device ID String: "ShMM-500"
      Site Type: 0x03, Site Number: 01
      Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
      Allocation: 20.0 Watts
      ...
      20: FRU # 3
      Entity: (0x1e, 0x60)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     Device Type: "FRU Inventory Device behind management controller" (0x10),
     Modifier 0x0
     Device ID String: "Fan Tray 0"
     Site Type: 0x04, Site Number: 01
     Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
     Allocation: 200.0 Watts
     20: FRU # 4
     Entity: (0x1e, 0x61)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     Device Type: "FRU Inventory Device behind management controller" (0x10),
     Modifier 0x0
     Device ID String: "Fan Tray 1"
     Site Type: 0x04, Site Number: 02
     Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
     Allocation: 200.0 Watts
     20: FRU # 5
     Entity: (0x1e, 0x62)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     Device Type: "FRU Inventory Device behind management controller" (0x10),
     Modifier 0x0
     Device ID String: "Fan Tray 2"
     Site Type: 0x04, Site Number: 03
     Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
     Allocation: 200.0 Watts
     20: FRU # 6
     Entity: (0x15, 0x60)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
     Change Cause: Normal State Change (0x0)
     Device Type: "FRU Inventory Device behind management controller" (0x10),
     Modifier 0x0
     Device ID String: "PEM A"
     Site Type: 0x01, Site Number: 01
     Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
     Allocation: 20.0 Watts
     20: FRU # 7
     Entity: (0x15, 0x61)
     Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
    Change Cause: Normal State Change (0x0)
    Device Type: "FRU Inventory Device behind management controller" (0x10),
    Modifier 0x0
    Device ID String: "PEM B"
    Site Type: 0x01, Site Number: 02
    Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
    Allocation: 20.0 Watts
    20: FRU # 8
    Entity: (0xf3, 0x6f)
    Hot Swap State: M4 (Active), Previous: M3 (Activation In Process), Last State
   Change Cause: Normal State Change (0x0)
    Device Type: "FRU Inventory Device behind management controller" (0x10),
    Modifier 0x0
    Device ID String: "SAP Board"
    Site Type: 0x06, Site Number: 01
    Current Power Level: 0x01, Maximum Power Level: 0x01, Current Power
   Allocation: 2.0 Watts
   10: and 12: are shm1 and shm2, the rest of 20: are chassis FRUs.

Q: How to find the IP address of ShMM?
A: Find eth0 of the >>>Network Interfaces Session;
    >>> Network Interfaces
    ...
   eth0 Link encap:Ethernet HWaddr 00:50:C2:3F:D1:30
   inet addr:10.5.58.110 Bcast:10.255.255.255 Mask:255.255.248.0
   UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
   RX packets:12685231 errors:1909 dropped:1909 overruns:0 frame:0
   TX packets:6569 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:865220012 (825.1 MiB) TX bytes:542114 (529.4 KiB)
   Interrupt:27
   eth1 Link encap:Ethernet HWaddr 00:50:C2:3F:D1:31
   inet addr:192.168.2.1 Bcast:192.168.7.255 Mask:255.255.248.0
   UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
   RX packets:0 errors:0 dropped:0 overruns:0 frame:0
   TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
   Interrupt:28
   ...
   vlan55 Link encap:Ethernet HWaddr 00:50:C2:3F:D1:30
   inet addr:192.168.13.109 Bcast:192.168.13.255 Mask:255.255.255.224
   UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
   RX packets:301 errors:0 dropped:0 overruns:0 frame:0
   TX packets:266 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:0
   RX bytes:17940 (17.5 KiB) TX bytes:17791 (17.3 KiB)
   Please also note that IP of vlan55 is FIXED and defined in /var/netcons.ip file.
   More network interface info could also be found in the >>>Shell Environment
   Variables Session;
   >>> Shell Environment Variables
   SCHROFF_VARIANT=ACB-III
   CARRIER=ACB
    RMCPADDR=10.5.63.110
    OPENHPI_THREADED=YES
    OPENHPI_DEBUG=NO
    OPENHPI_UID_MAP=/tmp/uid_map
    OPENHPI_CONF=/etc/openhpi.conf
    GATEWAY=10.5.56.1
    NETMASK=255.255.248.0
    IP1DEVICE=eth1
    IP1ADDR=192.168.1.2
    IPDEVICE=eth0
    IPADDR=192.168.0.4
    TZ=UTC
    LOGNAME=root
    USER=root
    MAIL=/var/mail/root
     TERM=vt100
    SHELL=/bin/sh
     PATH=/var/bin:/sbin:/bin:/usr/sbin:/usr/bin
     HOME=/home/root
     IFS=
     PS1=
     PS2=
     FILE=/tmp/debug.log
    Or the >>>ShMC LAN Configuration Session;
     >>> ShMC LAN Configuration
    Authentication Type Support: 0x15 ( None MD5 Straight Password/Key )
    Authentication Type Enables:
    Callback level: 0x00
    User level: 0x15 ( "None" "MD5" "Straight Password/Key" )
    Operator level: 0x15 ( "None" "MD5" "Straight Password/Key" )
    Administrator level: 0x15 ( "None" "MD5" "Straight Password/Key" )
    OEM level: 0x00
    IP Address: 10.5.58.110
    IP Address Source: Static Address (Manually Configured) (0x01)
    MAC Address: 00:50:c2:3f:d1:30
    Subnet Mask: 255.255.248.0
    IPv4 Header Parameters: 0x40:0x40:0x10
    Primary RMCP Port Number: 0x026f
    Secondary RMCP Port Number: 0x0298
    BMC-generated ARP Control: 0x02
    Enable BMC-generated ARP Response
    Gratuitous ARP Interval: 2.0 seconds
    Default Gateway Address: 10.5.56.1
    Default Gateway MAC Address: 00:04:96:1e:29:30
    Backup Gateway Address: 0.0.0.0
    Backup Gateway MAC Address: N/A
    Community String: "public"
    Number of Destinations: 16
    Destination Type:
    N/A
    Destination Address:
    N/A

Q: What to look for in the >>>U-Boot Environment Variables Session?
A: There are couple of parameters to look into to prevent configuration problem;
>>> U-Boot Environment Variables>>

...
ipdevice=eth0
netmask=255.255.248.0
ip1addr=192.168.1.2
ip1device=eth1
rc2=/etc/rc.acb3
rmcpaddr=10.5.63.110
gateway=10.5.56.1
rc_ifconfig=y
ipaddr=192.168.0.4
baudrate=9600
console=ttyS0
addmisc=setenv bootargs $(bootargs) $(quiet) console=$(console),$(baudrate)
reliable_upgrade=$(reliable_upgrade)




The baudrate, console and addmisc parameters have to be set up the way it
shows here; then baud-rate of ShMM console becomes 9600 (install of default
115200).

Q: How to identify if any Temperature problem?
A: Look for the >>>Fan List, >>>Cooling State, and >>>Fan State Sessions;
If everything is normal, it should look like:
   >>> Fan List
   20: FRU # 3
   Current Level: 5
   Minimum Speed Level: 0, Maximum Speed Level: 15
   20: FRU # 4
   Current Level: 5
   Minimum Speed Level: 0, Maximum Speed Level: 15
   20: FRU # 5
   Current Level: 5
   Minimum Speed Level: 0, Maximum Speed Level: 15
   >>> Cooling State
   Cooling state: "Normal"
   Sensor(s) at this state: (0x82,20,0) (0x82,36,0) (0x82,37,0) (0x82,44,0)
   (0x82,45,0) (0x82,52,0) (0x82,53,0) (0x82,10,0)
   (0x92,31,0) (0x20,126,0) (0x20,125,0) (0x86,19,0)
   (0x86,20,0) (0x86,21,0) (0x86,22,0) (0x86,23,0)
   (0x86,24,0) (0x86,25,0) (0x86,26,0) (0x92,6,0)
   (0x92,7,0) (0x92,30,0) (0x9c,4,0) (0x9c,5,0)
   (0x92,5,0) (0x98,4,0) (0x98,5,0) (0x9c,3,0)
   (0x94,7,0) (0x94,8,0) (0x94,23,0) (0x94,24,0)
   (0x94,25,0) (0x94,40,0) (0x94,41,0) (0x94,42,0)
   (0x98,3,0) (0x88,7,0) (0x88,8,0) (0x88,23,0)
   (0x88,24,0) (0x88,25,0) (0x94,6,0) (0x9a,5,0)
   (0x9a,6,0) (0x9a,29,0) (0x9a,30,0) (0x9a,31,0)
   (0x88,6,0) (0x9a,4,0) (0x96,5,0) (0x96,6,0)
   (0x96,29,0) (0x96,30,0) (0x96,31,0) (0x90,6,0)
   (0x90,7,0) (0x90,8,0) (0x90,23,0) (0x90,24,0)
   (0x90,25,0) (0x90,40,0) (0x90,41,0) (0x90,42,0)
   (0x96,4,0) (0x20,120,0) (0x20,121,0) (0x20,122,0)
   (0x20,123,0) (0x20,124,0) (0x20,200,0) (0x20,201,0)
   (0x10,2,0)
   >>> Fan State
   Fans state: "Normal"
   Sensor(s) at this state: (0x10,8,0) (0x10,10,0) (0x10,11,0) (0x10,13,0)
   (0x10,14,0) (0x10,7,0)
   If any problem, it would look similar:
   >>> Fan List
  20: FRU # 3
  Current Level: 15
  Minimum Speed Level: 0, Maximum Speed Level: 15
  20: FRU # 4
  Current Level: 15
  Minimum Speed Level: 0, Maximum Speed Level: 15
  20: FRU # 5
  Current Level: 15
  Minimum Speed Level: 0, Maximum Speed Level: 15
  >>> Cooling State
  Cooling state: "Minor Alert"
  Sensor(s) at this state: (0x98,31,0) (0x9c,30,0) (0x9c,31,0) (0x86,30,0)
  (0x86,31,0)
  >>> Fan State
  Fans state: "Normal"
  Sensor(s) at this state: (0x10,8,0) (0x10,10,0) (0x10,11,0) (0x10,13,0)
  (0x10,14,0) (0x10,7,0)

In this output, the Cooling State shows that some of the temperature sensors are at Minor Alert (instead of Normal).  These sensors are read as (IPMB Address, Sensor Number, LUN) --- (0x98, 31, 0) shows blade in slot 13 (0x98), sensor 31. Need to check what these sensors are and what are their reading before determine if there is a temperature problem in chassis --- also check for dirty Air Filter.
The >>>Fan List shows all FT are running full speed. This may or may not caused by temperature problem, because only blades (0x86, 0x98 and 0x9c) are showing cooling alert, not chassis (0x20). There might be other problems --- could be power related.
Chassis temperature sensors are:
20: LUN: 0, Sensor # 120 ("Center Exhaust")
20: LUN: 0, Sensor # 121 ("Left Exhaust")
20: LUN: 0, Sensor # 122 ("Right Exhaust")
20: LUN: 0, Sensor # 124 ("Temp_In Left")
20: LUN: 0, Sensor # 125 ("Temp_In Center")
20: LUN: 0, Sensor # 126 ("Temp_In Right")

Q: Any power related information?
A: Look into the >>>Shelf FRU Info Session, and find following;
PICMG Shelf Power Distribution Record (ID=0x11)
Version = 0
Feed count: 4
Feed:
Maximum External Available Current: 28.0 Amps
Maximum Internal Current: 27.6 Amps
Minimum Expected Operating Voltage: -40.5 Volts
Feed-to-FRU Mapping entries count: 3
FRU Addr: 45, FRU ID: 0xfe <---- Slot 5
FRU Addr: 49, FRU ID: 0xfe <---- Slot 3
FRU Addr: 4d, FRU ID: 0xfe <---- Slot 1
Feed:
Maximum External Available Current: 28.0 Amps
Maximum Internal Current: 27.6 Amps
Minimum Expected Operating Voltage: -40.5 Volts
Feed-to-FRU Mapping entries count: 6
FRU Addr: 41, FRU ID: 0xfe <---- Slot 7
FRU Addr: 43, FRU ID: 0xfe <---- Slot 6
FRU Addr: 47, FRU ID: 0xfe <---- Slot 4
FRU Addr: 4b, FRU ID: 0xfe <---- Slot 2
FRU Addr: 08, FRU ID: 0xfe <---- shm1
FRU Addr: 10, FRU ID: 0x03 <---- FT 1
Feed:
Maximum External Available Current: 28.0 Amps
Maximum Internal Current: 27.6 Amps
Minimum Expected Operating Voltage: -40.5 Volts
Feed-to-FRU Mapping entries count: 6
FRU Addr: 42, FRU ID: 0xfe <---- Slot 8
FRU Addr: 44, FRU ID: 0xfe <---- Slot 9
FRU Addr: 48, FRU ID: 0xfe <---- Slot 11
FRU Addr: 4c, FRU ID: 0xfe <---- Slot 13
FRU Addr: 09, FRU ID: 0xfe <---- shm2
FRU Addr: 10, FRU ID: 0x04 <---- FT 2
Feed:
Maximum External Available Current: 28.0 Amps
Maximum Internal Current: 27.6 Amps
Minimum Expected Operating Voltage: -40.5 Volts
Feed-to-FRU Mapping entries count: 4
FRU Addr: 46, FRU ID: 0xfe <---- Slot 10
FRU Addr: 4a, FRU ID: 0xfe <---- Slot 12
FRU Addr: 4e, FRU ID: 0xfe <---- Slot 14
FRU Addr: 10, FRU ID: 0x05 <---- FT 3
This record shows what chassis component each PEM feed connects to.

Q: What is the >>>System Event Log Session?
A: The >>>System Event Log Session records all chassis related events.
0x0001: <D&T>; from:(0x98,0,0); sensor:(0x02,20); event:0x1(deasserted): "Lower Critical", 0x02 0xFF 0xFF
0x0002: <D&T>; from:(0x98,0,0); sensor:(0x02,20); event:0x1(deasserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0003: <D&T>; from:(0x98,0,0); sensor:(0x02,21); event:0x1(asserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0004: <D&T>; from:(0x98,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M2->M3, Cause=0x1
0x0005: <D&T>; from:(0x88,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M0->M1, Cause=0x0
0x0006: <D&T>; from:(0x88,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M1->M2, Cause=0x2
0x0007: <D&T>; from:(0x98,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M3->M4, Cause=0x0
0x0008: <D&T>; from:(0x88,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M2->M3, Cause=0x1
0x0009: <D&T>; from:(0x88,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M3->M4, Cause=0x0
0x000A: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(asserted): 0x7C 0x21 0x32
0x000B: <D&T>; from:(0x12,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M7->M1, Cause=0x4
0x000C: <D&T>; from:(0x12,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M1->M2, Cause=0x2
0x000D: <D&T>; from:(0x84,0,0); sensor:(0xf1,1); event:0x6f(asserted): 0xA3 0x00 0x88
0x000E: <D&T>; from:(0x84,0,0); sensor:(0x08,2); event:0x3(asserted): 0x00 0xFF 0xFF
0x000F: <D&T>; from:(0x84,0,0); sensor:(0x15,3); event:0x8(asserted): 0x00 0xFF 0xFF
0x0010: <D&T>; from:(0x84,0,0); sensor:(0x07,4); event:0x3(asserted): 0x00 0xFF 0xFF
0x0011: <D&T>; from:(0x12,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M2->M3, Cause=0x1
0x0012: <D&T>; from:(0x12,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M3->M4, Cause=0x0
0x0013: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(asserted): 0x78 0x2C 0x32
0x0014: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(asserted): 0x78 0x2C 0x32
0x0015: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(deasserted): 0x78 0x2C 0x32
0x0016: <D&T>; from:(0x88,0,0); sensor:(0x02,14); event:0x1(deasserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0017: <D&T>; from:(0x98,0,0); sensor:(0x02,14); event:0x1(deasserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0018: <D&T>; from:(0x98,0,0); sensor:(0xf1,3); event:0x6f(asserted): 0xA3 0x00 0x88
0x0019: <D&T>; from:(0x98,0,0); sensor:(0x02,9); event:0x1(asserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x001A: <D&T>; from:(0x98,0,0); sensor:(0x02,9); event:0x1(asserted): "Lower Critical", 0x02 0xFF 0xFF
0x001B: <D&T>; from:(0x98,0,0); sensor:(0x02,21); event:0x1(asserted): "Upper Non-Critical", 0x07 0xFF 0xFF
0x001C: <D&T>; from:(0x98,0,0); sensor:(0x25,25); event:0x6f(asserted): 0x01 0xFF 0xFF
0x001D: <D&T>; from:(0x84,0,0); sensor:(0x07,4); event:0x3(deasserted): 0x61 0xF1 0x80
0x001E: <D&T>; from:(0x84,0,0); sensor:(0x07,5); event:0x3(asserted): 0x61 0xF0 0x00
0x001F: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(asserted): 0x79 0x28 0x30
0x0020: <D&T>; from:(0x20,0,0); sensor:(0x25,168); event:0x6f(asserted): 0x01 0xFF 0xFF
0x0021: <D&T>; from:(0x20,0,0); sensor:(0x25,169); event:0x6f(asserted): 0x01 0xFF 0xFF
0x0022: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(asserted): 0x71 0x09 0x30
0x0023: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(deasserted): 0x71 0x09 0x30
0x0024: <D&T>; from:(0x10,0,0); sensor:(0xde,128); event:0x6f(deasserted): 0x71 0x09 0x30
0x0025: <D&T>; from:(0x20,0,0); sensor:(0x25,168); event:0x6f(asserted): 0x00 0xFF 0xFF
0x0026: <D&T>; from:(0x20,0,0); sensor:(0x25,169); event:0x6f(asserted): 0x00 0xFF 0xFF
0x0027: <D&T>; from:(0x88,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M4->M0, Cause=0x6
0x0028: <D&T>; from:(0x88,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M4->M0, Cause=0x6
0x0029: <D&T>; from:(0x98,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M4->M0, Cause=0x6
0x002A: <D&T>; from:(0x98,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M4->M0, Cause=0x6
0x002B: <D&T>; from:(0x12,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M4->M7, Cause=0x4
0x002C: <D&T>; from:(0x88,0,0); sensor:(0xf1,3); event:0x6f(asserted): 0xA3 0x00 0x88
0x002D: <D&T>; from:(0x98,0,0); sensor:(0xf0,2); event:0x6f(asserted): HotSwap: FRU 2 M0->M0, Cause=0x0
0x002E: <D&T>; from:(0x84,0,0); sensor:(0xf0,0); event:0x6f(asserted): HotSwap: FRU 0 M4->M7, Cause=0x4
0x002F: <D&T>; from:(0x88,0,0); sensor:(0x02,8); event:0x1(asserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0030: <D&T>; from:(0x98,0,0); sensor:(0x02,8); event:0x1(asserted): "Lower Non-Critical", 0x00 0xFF 0xFF
0x0031: <D&T>; from:(0x88,0,0); sensor:(0x02,8); event:0x1(asserted): "Lower Critical", 0x02 0xFF 0xFF
0x0032: <D&T>; from:(0x98,0,0); sensor:(0x02,8); event:0x1(asserted): "Lower Critical", 0x02 0xFF 0xFF

NOTE: <D&T> is <Date and Time>
The message has the format of
<ID>: Event at <D&T>; from:(IPMB, FRU, LUN); sensor:(<type>, <#>); event:<event type>: <Details>

Thus, from:(0x84,0,0); sensor:(0x07,4); event:0x3(asserted) shows it is from slot 8 (0x84, switch blade), FRU 0 (the board itself), and sensor 4 is asserted. Need to use some of the data collecting commands to find out what is sensor 4 of slot 4; then use sel -v command to see the exact details of the event.

Since the events are from all components in chassis, depending on the symptom given by Customer, one needs to group the events from the same component (whether it is a blade or a component --- ShMM, FT, PEM &c.) to make judgment of what might be RC.

Q: Where is ShMM booting related information located?
A: It is in the last session: >>>Shelfman Output to syslog
This session is the collection of /var/log/messages files. Some events in the SEL
will also be logged here.
>>


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback