Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1396556.1
Update Date:2018-01-03
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1396556.1 :   Datacenter Switch36  


Related Items
  • Sun Datacenter InfiniBand Switch 36
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  
  • _Old GCS Categories>Sun Microsystems>Operating Systems>Solaris Network
  •  
  • _Old GCS Categories>Sun Microsystems>Switches>Sun InfiniBand IB
  •  




In this Document
Purpose
Details
 Product Support Team
 Alerts
 Description
  Sun Datacenter Infiniband Switch 36 (M2-36p)
  Management Access
  Rear Status LEDs
  Front Status LEDs
 Versions
 Firmware
  Hardware
  Cables
 Compatibility/Patches
 Configuration
 FAQ
 Information Gathering
  Example output of information gathering commands
 Installation
 De-installation
 Troubleshooting
  Internal Port Mapping
  Physical Connector Numbering (on front panel)
 Performance
 Lab
 Contacts
 Proactive
 Training:


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Solaris and Network Domain (SaND) Product Page, internal content

Applies to:

Sun Datacenter InfiniBand Switch 36 - Version Not Applicable and later
Information in this document applies to any platform.

Purpose

This document contains the Datacenter Switch36 Product Page.

Details

Product Support Team


Solaris Networking Team: Infiniband Switches, PLA: SN-SND: Sun Network Infiniband

Storage Driver Team: Components of Solaris Infiniband Stack, PLA: SN-DK: Storage Drivers

  • Solaris Infiniband driver
  • HCA (Host Channel Adapter, H/W)
  • RDS
  • EoIB
  • IPoIB
  • OpenFabric User Fabrics (OFUV)
  • Lustre LND
  • SRP (SCSI RDMA Protocol)
  • SDP
  • ISER (ISCSI Extensions to RDMA)
  • FCoIP
  • rNFS
  • UDAPL
  • Open SM
  • Diag Tools
  • SA
  • MAD
  • SMA
  • Communications Manager

 

Alerts


 

CR# 6931851 nm2-36p kontron hangs sometimes and needs to be power cycled

Workaround is to powercycle the switch, it may be necessary to remove both power cords for a minute or so and re-attach them

Fix in FW 1.1.3-2b

 

CR# 7005108 NM2-36p OS hang on version 113

Workaround is to power-cycle the switch, it may be necessary to remove both power cords for a minute or so and re-attach them

This hang has a different cause compared with the hang described in CR#6931851 and has not been fully root-caused yet.

FW 1.1.4 has been built with a watchdog to detect this hang and reset the switch automatically. This FW is available via MOS (not qualified for Exadata systems)

 

 

 

Description


Sun Datacenter Infiniband Switch 36 (M2-36p)

Sun Datacenter InfiniBand Switch 36 Firmware Version 1.3
Sun Datacenter InfiniBand Switch 36 Firmware Version 2.0

Also known as DCS 36 or SDS 36 (also used in the Oracle Exadata V2 Machine)

 

This is an Infiniband leaf switch which can also be utilized in stand-alone mode or as a fabric manager node in a small cluster of switches.

The Sun Datacenter Switch 36 is housed in a 19" 1RU chassis and consists of a system board with one Mellanox Infiniscale IV switch chip.

It has 12 stacked QSFP connectors, each pair providing six 4X Infiniband ports for a total of 36 ports.

Each port is capable of QDR, DDR or SDR speeds and the switch provides fully non-blocking connections with a total data throughput of 2.3 Tb/sec (bidirectional).

Latency between ports is 100nsec at QDR rate.

 

 

 

Management Access

There are two RJ45 10/100/1000T Ethernet ports for management access. These ports are connected to an internal Broadcom BCM5384KPMG ethernet switch and allow multiple DCS 36 switches to be daisy chained together for out-of-band management connection. There is also a USB port connected to the service processor complete with over-current protection that can be used to provide serial access to the management function.

There is an on-board COM (Computer-on-Module) to host the IB Fabric Manager software and the internal chassis management software.

Connector-side (rear) view of the DCS 36

Fan-side (front) view of DCS 36

 

The DCS 36 has one fan board, that can support 5 fan modules but only the three innermost modules are present, and they are all hot-swappable.

Note The airflow is from the fans, through the chassis and out of the connector panel. The front of the chassis (fan end) draws air from the cold aisle and exhausts warm air from the rear of the chassis (connector end) to the hot aisle.

There are two power distribution boards with two hot-swappable power supplies each with their own power cord shipped as standard.

The embedded IB Fabric Manager supports active/hot-standby dual manager configurations.

IPMI and Shelf Management functions are included.

Passive QSFP copper cables up to 5m are supported. Cable insertion detection is implemented and NVRAM cable serial number can be read by the embedded management module.

For very large Infiniband fabrics it is recommended to use a host based subnet manager to ensure that there is sufficient cpu power available for the management of a large fabric.

 

Rear Status LEDs

 

 Identifying LED's

  • NET leds

The network management port status LEDs are located on the network management connector at the left side of the rear panel (cable connector side).

 

Link (l/h led)     :  Green     On       = 1000 Mb link
                                Off      = link down     
                      Amber     On       = 100 Mb or 10 Mb link

 

Activity (r/h led) :  Green     On       = link up
                                Flashing = packet activity

 

  • Link leds

The link status LEDs are located at the InfiniBand connectors of the rear panel

 

Link :     Green     On       = link established
                     Off      = link down
                     Flashing = symbol errors

 

  • Chassis leds

The chassis status LEDs are located on the right side of the rear panel

Top :   Locator (white)    On       = no function
                           Off      = disabled
                           Flashing = switch is identifying itself

Middle: Attention (amber)  On       = normal fault detected
                           Off      = no faults detected
                           Flashing = critical fault detected

Bottom :OK (green)         On       = switch is functional
                           Off      = switch is off or initializing
                           Flashing = no function

 

 

Front Status LEDs

The power supply status LEDs are located to the left side of each power supply at the front of the switch chassis.

 

  • Power Supply leds
Top:    OK (green)         On       = 12v DC is supplied
                           Off      = no DC voltage is present
                           Flashing = power supply is disabled, 12v DC is not supplied

Middle: Attention (amber)  On       = fault detected, 12v DC shutdown
                           Off      = no faults detected
                           Flashing = no function

Bottom: AC (green)         On       = AC power present and good
                           Off      = AC power not present
                           Flashing = fault or over voltage

 

  • Fan leds

The fan status LEDs are located in the lower right of each fan module at the front of the switch chassis

 

Attention:                  On      = fan is faulty
                            Off     = no fault

 

 

Versions


 

Firmware

To display the firmware version :

# version

SUN DCS 36p version: 1.3.3-2
Build time: Mar 25 2010 10:00:23
SP board info:
Manufacturing Date: 2009.06.22
Serial Number: "NCD3R0442"
Hardware Revision: 0x0006
Firmware Revision: 0x0102
BIOS version: NOW1R112     
BIOS date: 04/24/2009
#

 

Hardware

541-4267 36-Port Infiniband Switch
541-3495 36-Port Infiniband Switch Subassembly
350-1566 Fan Module
300-2143 760W Power Supply
300-2233 760W Power Supply, supersedes 300-2143
371-2210 CR2032 Battery

System Handbook

 

 

Cables

The DCS 36 has QSFP ports so the cables to connect QDR HCAs are :

Part# Option# Connector Length Type
530-4402 X2886-1M QSFP - QSFP 1 meter 4X - 4X Infiniband Cable
530-4403 X2886-2M QSFP - QSFP 2 meter 4X - 4X Infiniband Cable
530-4404 X2886-3M QSFP - QSFP 3 meter 4X - 4X Infiniband Cable
530-4415 X2886-5M QSFP - QSFP 5 meter 4X - 4X Infiniband Cable
530-4444 X2121A-1M QSFP - QSFP 1 meter QSFP - QSFP Cable (1)
530-4567 X2121A-2M QSFP - QSFP 2 meter QSFP - QSFP Cable (1)
530-4445 X2121A-3M QSFP - QSFP 3 meter QSFP - QSFP Cable (1)
530-4446 X2121A-5M QSFP - QSFP 5 meter QSFP - QSFP Cable (1)
530-4448 X2121A-10M QSFP - QSFP 10 meter QSFP - QSFP Cable (1)

(1) X2121A can be used for both Infiniband and 10GbE (X2886 are no longer orderable)

System Handbook

 

Compatibility/Patches


FW updates available on MOS

Type Version Patch Number
Update 1.3.3 11891229
Update 1.1.4 11736860
Update 1.1.3 10364281
Internal 1.1.2 not available
FCS 1.0.1 11825452

Note: Users of FW version 1.0.1 will need to upgrade to 1.1.3 or 1.1.4 before upgrading to 1.3.3

(for Exadata switches apply patch 12373676 in conjunction with 11891229)

 

Configuration


The DCS 36 has two connection options for communication with the management controller.

The NET connector is an Ethernet interface.
This connector is preferred because it permits remote management of the switch over the Ethernet network.


The USB management connector labeled USB.
This is the second choice for communication with the management controller in the switch. The management console can be a serial terminal, a system running a TIP connection, or other serial device which communicates with the management controller through a USB-to-serial adapter (not supplied). The serial parameters for communication with the USB-to-serial adapter is typically 115200, 8, N, 1. It is recommended that, for reliable operation, the USB serial cable is no longer than 10 meters

If there is a DHCP server on the management ethernet network then configure it with the mac address of the DCS 36 (printed on the serial number plate of the switch chassis) and a free IP address. When the DCS 36 boots it will get its IP address via DHCP.

If there is no DHCP server then the USB serial connector can be used for management access.

 

FAQ


Exalogic FAQ

Infinibnad Triage

 

Information Gathering


 

On the DCS 36 Infiniband switch the following log files are available

  • /var/log/messages
  • /var/log/opensm.log
  • /var/log/opensm-subnet.lst

(the opensm files may not contain much information if the SDS 36 is not the Subnet Manager Master)

And the following utilites can be used to collect data

Specific to SDS 36 IB switch

  • /usr/local/bin/version
  • /usr/local/bin/env_test
  • /usr/local/bin/listlinkup
  • /usr/local/bin/version
  • /usr/bin/uptime

Can be ran on either SDS 36 IB switch or DBnode

  • /usr/sbin/ibnetdiscover
  • /usr/bin/ibdiagnet -skip dup_guids -pm
    • collect all the files it creates in /tmp :- ibdiagnet.db ibdiagnet.fdbs ibdiagnet.log ibdiagnet.lst ibdiagnet.mcfdbs ibdiagnet.pkey ibdiagnet.pm ibdiagnet.sm ibdiagnet_ibis.log
    • if ibdiagnet is run on one of the server systems the "-skip dup_guids" parameter can be omitted
  • /usr/sbin/ibcheckerrors -v (collect output)
  • /usr/sbin/ibqueryerrors.pl
  • /usr/sbin/iblinkinfo.pl
  • /usr/sbin/ibhosts
  • /usr/sbin/ibswitches

Specific to DBnode

  • /opt/oracle.SupportTools/ibdiagtools/verify-topology -t (halfrack or quarterrack)
  • /usr/sbin/ibstat

 

Example output of information gathering commands

listlinkup shows all the connectors on the DCS 36, whether they have a cable present, and, if so, the mapping to its I4 port and the logical state of the link

 

# listlinkup
Connector  0A Not present
Connector  1A Not present
Connector  2A Not present
Connector  3A Not present
Connector  4A Not present
Connector  5A Not present
Connector  6A Not present
Connector  7A Not present
Connector  8A Present <-> I4 Port 31 is up
Connector  9A Present <-> I4 Port 14 is up
Connector 10A Present <-> I4 Port 16 is up
Connector 11A Present <-> I4 Port 18 is up
Connector 12A Not present
Connector 13A Present <-> I4 Port 09 is up
Connector 14A Present <-> I4 Port 07 is up
Connector 15A Present <-> I4 Port 05 is up
Connector 16A Present <-> I4 Port 03 is up
Connector 17A Present <-> I4 Port 01 is up
Connector  0B Not present
Connector  1B Not present
Connector  2B Not present
Connector  3B Not present
Connector  4B Not present
Connector  5B Not present
Connector  6B Not present
Connector  7B Not present
Connector  8B Not present
Connector  9B Present <-> I4 Port 13 is up
Connector 10B Present <-> I4 Port 15 is up
Connector 11B Present <-> I4 Port 17 is up
Connector 12B Present <-> I4 Port 12 is up
Connector 13B Present <-> I4 Port 10 is up
Connector 14B Present <-> I4 Port 08 is up
Connector 15B Present <-> I4 Port 06 is up
Connector 16B Present <-> I4 Port 04 is up
Connector 17B Present <-> I4 Port 02 is up

 

Installation


The following are shipped with the DCS 36 switch

Cable bracket and rackmount kit
Cable management bracket and cover
Two rack-mounting rail assemblies
Assortment of screws and captive nuts
Sun Datacenter InfiniBand Switch 36 Getting Started Guide
During installation it is very important to ensure that any copper core InfiniBand cable is not subjected to a bend tighter than a 5 inch (127 mm) radius.

Do not allow any optical InfiniBand cable to bend tighter than a 3.4 inch (85 mm) radius. Tight bends can damage the cable internally.

Do not use zip ties to bundle or support InfiniBand cables. The sharp edges of the ties can damage the cables internally, use soft hook and loop straps to keep cables organized.

Do not allow any InfiniBand cable to experience extreme tension. Do not pull on an InfiniBand cable or allow it to drag. Pulling on an InfiniBand cable can damage the cables internally

Do not twist an InfiniBand cable more than 1 revolution for its entire length. Twisting an InfiniBand cable can damage the cable internally.

Do not route InfiniBand cables where they might be stepped upon or experience rolling loads. Such a crushing effect can damage the cable internally.

 

De-installation


 

 

Troubleshooting


 

To check the overall health of the switch

# showunhealthy
OK - No unhealthy sensors
#

Check the power supplies

# checkpower
PSU 0 present status: OK
PSU 1 present status: OK
#

Check the fans (only three are installed in the DCS 36)

# getfanspeed
Fan 0 not present
Fan 1 rpm 12684
Fan 2 rpm 12946
Fan 3 rpm 12684
Fan 4 not present
#

Note: If there are less than two operational fans the DCS 36 will shutdown to prevent thermal overload

Check the Infiniband ASIC

# checkboot 0
I4 OK
#

Or, to run all tests

# env_test

NM2 Environment test started:
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.44 V
Measured 12V = 12.00 V
Measured 5V = 5.03 V
Measured VBAT = 3.09 V
Measured 2.5V = 2.51 V
Measured 1.8V = 1.80 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK
Starting PSU test:
PSU 0 present
PSU 1 present
PSU test returned OK
Starting Temperature test:
Back temperature 33.38
Front temperature 35.50
ComEx temperature 38.12
I4 temperature 55, maxtemperature 57
Temperature test returned OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 12684
Fan 2 running at rpm 12946
Fan 3 running at rpm 12684
Fan 4 not present
FAN test returned OK
Starting Connector test:
Connector test returned OK
Starting I4 test:
I4 OK
All I4s OK
I4 test returned OK
NM2 Environment test PASSED
#

Other utilities available :

Check the internal temperatures

# showtemps

Check voltages :

# checkvoltages

Display link status

# listlinkup

Display a port status

# getportstatus 0 port

(where port can be from 1 through 36)

 

 

Internal Port Mapping

 

Connector  0A  <-> Port 20
Connector  1A  <-> Port 22
Connector  2A  <-> Port 24
Connector  3A  <-> Port 26
Connector  4A  <-> Port 28
Connector  5A  <-> Port 30
Connector  6A  <-> Port 35
Connector  7A  <-> Port 33
Connector  8A  <-> Port 31
Connector  9A  <-> Port 14
Connector 10A  <-> Port 16
Connector 11A  <-> Port 18
Connector 12A  <-> Port 11
Connector 13A  <-> Port 09
Connector 14A  <-> Port 07
Connector 15A  <-> Port 05
Connector 16A  <-> Port 03
Connector 17A  <-> Port 01
Connector  0B  <-> Port 19
Connector  1B  <-> Port 21
Connector  2B  <-> Port 23
Connector  3B  <-> Port 25
Connector  4B  <-> Port 27
Connector  5B  <-> Port 29
Connector  6B  <-> Port 36
Connector  7B  <-> Port 34
Connector  8B  <-> Port 32
Connector  9B  <-> Port 13
Connector 10B  <-> Port 15
Connector 11B  <-> Port 17
Connector 12B  <-> Port 12 
Connector 13B  <-> Port 10 
Connector 14B  <-> Port 08 
Connector 15B  <-> Port 06
Connector 16B  <-> Port 04
Connector 17B  <-> Port 02

 

Physical Connector Numbering (on front panel)

 

 
  0A  1A  2A     3A  4A  5A     6A  7A  8A     9A 10A 11A     12A 13A 14A    15A 16A 17A
  -----------    -----------    -----------    -----------    -----------    -----------
 |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |
  -----------    -----------    -----------    -----------    -----------    -----------

  -----------    -----------    -----------    -----------    -----------    -----------
 |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |  |   |   |   |
  -----------    -----------    -----------    -----------    -----------    -----------
  0B  1B  2B     3B  4B  5B     6B  7B  8B     9B 10B 11B     12B 13B 14B    15B 16B 17B

 

Performance


 

 

Lab


NM2 36p in BUR

# ssh -l root 10.152.223.222

NM2 36p in BRM

# ssh -l root 10.80.23.26

 

Contacts


Email Alias

INFINIBAND_TRIAGE_US@oracle.com

 

IM Chat Rooms

Exadata IM Room - gcs_eest

L0 Hub - gcs_hub_exadata

 

Proactive


 

 

Sun Datacenter InfiniBand Switch 36 Documentation Library

 

Sun Datacenter InfiniBand Switch 36 Getting Started Guide                                      820-7750 (shipped with switch)

Sun Datacenter InfiniBand Switch 36 Product Notes                                              820-7748

Sun Datacenter InfiniBand Switch 36 User's Guide                                               820-7746

Sun Datacenter InfiniBand Switch 36 Command Reference                                          820-7747

Sun Datacenter InfiniBand Switch 36 Safety and Compliance Guide                                820-7749

Sun Datacenter InfiniBand Switch 36 Topic Set                                                  835-0784

Sun Datacenter InfiniBand Switch 36 and 72 Integrated Lights Out Manager (ILOM) 3.0 Supplement 821-1080

Sun System Handbook

 

Training:

Course Title:         Engineering TOI: Infiniband Overview and Driver Update for TSC Session 1
Course Description:
This TOI will provide an overview of the latest Infiniband technology and the latest driver stacks implemented in Solaris. This TOI will also discuss the new realigned Infiniband SR (Service Request) coverage between TSC networking and storage driver group.

Replay Duration: 94 minutes
Availability:         Employee
EDP Link: http://edp.oraclecorp.com/lms/faces/lms/edp/plancurrassignments.jspx?planid=178481
iLearning Link: http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=1234484941


Course Title:         Engineering TOI: InfiniBand Session 2
Course Description:
Introduction to Ethernet over InfiniBand technology using Oracle's InfiniBand Gateway Switch.
An overview of building blocks inside this Gateway switch, Virtual HUBs, Virtual NICs and use of Link Aggregation. How to administer EoIB VNICs in a datacenter and their use on compute nodes running Solaris and Linux. Introduction to ILOM snapshots from InfiniBand switches and their use in problem analysis.

Replay Duration: 94 minutes
Availability:         Partner and Employee
EDP Link: http://edp.oraclecorp.com/lms/faces/lms/edp/plancurrassignments.jspx?planid=176825
iLearning Link: http://oukc.oracle.com/static09/opn/login/?t=checkusercookies|r=-1|c=1218512234


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback