![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1396556.1 : Datacenter Switch36
In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:Sun Datacenter InfiniBand Switch 36 - Version Not Applicable and laterInformation in this document applies to any platform. PurposeThis document contains the Datacenter Switch36 Product Page. DetailsProduct Support TeamSolaris Networking Team: Infiniband Switches, PLA: SN-SND: Sun Network Infiniband Storage Driver Team: Components of Solaris Infiniband Stack, PLA: SN-DK: Storage Drivers
Alerts
Workaround is to powercycle the switch, it may be necessary to remove both power cords for a minute or so and re-attach them Fix in FW 1.1.3-2b
Workaround is to power-cycle the switch, it may be necessary to remove both power cords for a minute or so and re-attach them This hang has a different cause compared with the hang described in CR#6931851 and has not been fully root-caused yet. FW 1.1.4 has been built with a watchdog to detect this hang and reset the switch automatically. This FW is available via MOS (not qualified for Exadata systems)
DescriptionSun Datacenter Infiniband Switch 36 (M2-36p)Sun Datacenter InfiniBand Switch 36 Firmware Version 1.3 Also known as DCS 36 or SDS 36 (also used in the Oracle Exadata V2 Machine)
This is an Infiniband leaf switch which can also be utilized in stand-alone mode or as a fabric manager node in a small cluster of switches. The Sun Datacenter Switch 36 is housed in a 19" 1RU chassis and consists of a system board with one Mellanox Infiniscale IV switch chip. It has 12 stacked QSFP connectors, each pair providing six 4X Infiniband ports for a total of 36 ports. Each port is capable of QDR, DDR or SDR speeds and the switch provides fully non-blocking connections with a total data throughput of 2.3 Tb/sec (bidirectional). Latency between ports is 100nsec at QDR rate.
Management AccessThere are two RJ45 10/100/1000T Ethernet ports for management access. These ports are connected to an internal Broadcom BCM5384KPMG ethernet switch and allow multiple DCS 36 switches to be daisy chained together for out-of-band management connection. There is also a USB port connected to the service processor complete with over-current protection that can be used to provide serial access to the management function. There is an on-board COM (Computer-on-Module) to host the IB Fabric Manager software and the internal chassis management software. Connector-side (rear) view of the DCS 36 Fan-side (front) view of DCS 36
The DCS 36 has one fan board, that can support 5 fan modules but only the three innermost modules are present, and they are all hot-swappable. Note The airflow is from the fans, through the chassis and out of the connector panel. The front of the chassis (fan end) draws air from the cold aisle and exhausts warm air from the rear of the chassis (connector end) to the hot aisle. There are two power distribution boards with two hot-swappable power supplies each with their own power cord shipped as standard. The embedded IB Fabric Manager supports active/hot-standby dual manager configurations. IPMI and Shelf Management functions are included. Passive QSFP copper cables up to 5m are supported. Cable insertion detection is implemented and NVRAM cable serial number can be read by the embedded management module. For very large Infiniband fabrics it is recommended to use a host based subnet manager to ensure that there is sufficient cpu power available for the management of a large fabric.
Rear Status LEDs
The network management port status LEDs are located on the network management connector at the left side of the rear panel (cable connector side).
Link (l/h led) : Green On = 1000 Mb link Off = link down Amber On = 100 Mb or 10 Mb link
Activity (r/h led) : Green On = link up Flashing = packet activity
The link status LEDs are located at the InfiniBand connectors of the rear panel
Link : Green On = link established Off = link down Flashing = symbol errors
The chassis status LEDs are located on the right side of the rear panel Top : Locator (white) On = no function Off = disabled Flashing = switch is identifying itself Middle: Attention (amber) On = normal fault detected Off = no faults detected Flashing = critical fault detected Bottom :OK (green) On = switch is functional Off = switch is off or initializing Flashing = no function
Front Status LEDsThe power supply status LEDs are located to the left side of each power supply at the front of the switch chassis.
Top: OK (green) On = 12v DC is supplied Off = no DC voltage is present Flashing = power supply is disabled, 12v DC is not supplied Middle: Attention (amber) On = fault detected, 12v DC shutdown Off = no faults detected Flashing = no function Bottom: AC (green) On = AC power present and good Off = AC power not present Flashing = fault or over voltage
The fan status LEDs are located in the lower right of each fan module at the front of the switch chassis
Attention: On = fan is faulty Off = no fault
Versions
FirmwareTo display the firmware version : # version SUN DCS 36p version: 1.3.3-2 Build time: Mar 25 2010 10:00:23 SP board info: Manufacturing Date: 2009.06.22 Serial Number: "NCD3R0442" Hardware Revision: 0x0006 Firmware Revision: 0x0102 BIOS version: NOW1R112 BIOS date: 04/24/2009 #
Hardware
CablesThe DCS 36 has QSFP ports so the cables to connect QDR HCAs are :
(1) X2121A can be used for both Infiniband and 10GbE (X2886 are no longer orderable)
Compatibility/PatchesFW updates available on MOS
Note: Users of FW version 1.0.1 will need to upgrade to 1.1.3 or 1.1.4 before upgrading to 1.3.3 (for Exadata switches apply patch 12373676 in conjunction with 11891229)
ConfigurationThe DCS 36 has two connection options for communication with the management controller.
FAQExalogic FAQ Infinibnad Triage
Information Gathering
On the DCS 36 Infiniband switch the following log files are available
(the opensm files may not contain much information if the SDS 36 is not the Subnet Manager Master) And the following utilites can be used to collect data Specific to SDS 36 IB switch
Can be ran on either SDS 36 IB switch or DBnode
Specific to DBnode
Example output of information gathering commandslistlinkup shows all the connectors on the DCS 36, whether they have a cable present, and, if so, the mapping to its I4 port and the logical state of the link
# listlinkup Connector 0A Not present Connector 1A Not present Connector 2A Not present Connector 3A Not present Connector 4A Not present Connector 5A Not present Connector 6A Not present Connector 7A Not present Connector 8A Present <-> I4 Port 31 is up Connector 9A Present <-> I4 Port 14 is up Connector 10A Present <-> I4 Port 16 is up Connector 11A Present <-> I4 Port 18 is up Connector 12A Not present Connector 13A Present <-> I4 Port 09 is up Connector 14A Present <-> I4 Port 07 is up Connector 15A Present <-> I4 Port 05 is up Connector 16A Present <-> I4 Port 03 is up Connector 17A Present <-> I4 Port 01 is up Connector 0B Not present Connector 1B Not present Connector 2B Not present Connector 3B Not present Connector 4B Not present Connector 5B Not present Connector 6B Not present Connector 7B Not present Connector 8B Not present Connector 9B Present <-> I4 Port 13 is up Connector 10B Present <-> I4 Port 15 is up Connector 11B Present <-> I4 Port 17 is up Connector 12B Present <-> I4 Port 12 is up Connector 13B Present <-> I4 Port 10 is up Connector 14B Present <-> I4 Port 08 is up Connector 15B Present <-> I4 Port 06 is up Connector 16B Present <-> I4 Port 04 is up Connector 17B Present <-> I4 Port 02 is up
InstallationThe following are shipped with the DCS 36 switch
De-installation
Troubleshooting
To check the overall health of the switch # showunhealthy Check the power supplies # checkpower Check the fans (only three are installed in the DCS 36) # getfanspeed Note: If there are less than two operational fans the DCS 36 will shutdown to prevent thermal overload Check the Infiniband ASIC # checkboot 0 Or, to run all tests # env_test NM2 Environment test started: Starting Voltage test: Voltage ECB OK Measured 3.3V Main = 3.28 V Measured 3.3V Standby = 3.44 V Measured 12V = 12.00 V Measured 5V = 5.03 V Measured VBAT = 3.09 V Measured 2.5V = 2.51 V Measured 1.8V = 1.80 V Measured I4 1.2V = 1.22 V Voltage test returned OK Starting PSU test: PSU 0 present PSU 1 present PSU test returned OK Starting Temperature test: Back temperature 33.38 Front temperature 35.50 ComEx temperature 38.12 I4 temperature 55, maxtemperature 57 Temperature test returned OK Starting FAN test: Fan 0 not present Fan 1 running at rpm 12684 Fan 2 running at rpm 12946 Fan 3 running at rpm 12684 Fan 4 not present FAN test returned OK Starting Connector test: Connector test returned OK Starting I4 test: I4 OK All I4s OK I4 test returned OK NM2 Environment test PASSED # Other utilities available : Check the internal temperatures # showtemps Check voltages : # checkvoltages Display link status # listlinkup Display a port status # getportstatus 0 port (where port can be from 1 through 36)
Internal Port Mapping
Connector 0A <-> Port 20 Connector 1A <-> Port 22 Connector 2A <-> Port 24 Connector 3A <-> Port 26 Connector 4A <-> Port 28 Connector 5A <-> Port 30 Connector 6A <-> Port 35 Connector 7A <-> Port 33 Connector 8A <-> Port 31 Connector 9A <-> Port 14 Connector 10A <-> Port 16 Connector 11A <-> Port 18 Connector 12A <-> Port 11 Connector 13A <-> Port 09 Connector 14A <-> Port 07 Connector 15A <-> Port 05 Connector 16A <-> Port 03 Connector 17A <-> Port 01 Connector 0B <-> Port 19 Connector 1B <-> Port 21 Connector 2B <-> Port 23 Connector 3B <-> Port 25 Connector 4B <-> Port 27 Connector 5B <-> Port 29 Connector 6B <-> Port 36 Connector 7B <-> Port 34 Connector 8B <-> Port 32 Connector 9B <-> Port 13 Connector 10B <-> Port 15 Connector 11B <-> Port 17 Connector 12B <-> Port 12 Connector 13B <-> Port 10 Connector 14B <-> Port 08 Connector 15B <-> Port 06 Connector 16B <-> Port 04 Connector 17B <-> Port 02
Physical Connector Numbering (on front panel)
0A 1A 2A 3A 4A 5A 6A 7A 8A 9A 10A 11A 12A 13A 14A 15A 16A 17A ----------- ----------- ----------- ----------- ----------- ----------- | | | | | | | | | | | | | | | | | | | | | | | | ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- | | | | | | | | | | | | | | | | | | | | | | | | ----------- ----------- ----------- ----------- ----------- ----------- 0B 1B 2B 3B 4B 5B 6B 7B 8B 9B 10B 11B 12B 13B 14B 15B 16B 17B
Performance
LabNM2 36p in BUR # ssh -l root 10.152.223.222 NM2 36p in BRM # ssh -l root 10.80.23.26
ContactsEmail Alias INFINIBAND_TRIAGE_US@oracle.com
IM Chat Rooms Exadata IM Room - gcs_eest L0 Hub - gcs_hub_exadata
Proactive
Sun Datacenter InfiniBand Switch 36 Documentation Library
Sun Datacenter InfiniBand Switch 36 Getting Started Guide 820-7750 (shipped with switch) Sun Datacenter InfiniBand Switch 36 Product Notes 820-7748 Sun Datacenter InfiniBand Switch 36 User's Guide 820-7746 Sun Datacenter InfiniBand Switch 36 Command Reference 820-7747 Sun Datacenter InfiniBand Switch 36 Safety and Compliance Guide 820-7749 Sun Datacenter InfiniBand Switch 36 Topic Set 835-0784 Sun Datacenter InfiniBand Switch 36 and 72 Integrated Lights Out Manager (ILOM) 3.0 Supplement 821-1080
Training:Course Title: Engineering TOI: Infiniband Overview and Driver Update for TSC Session 1 Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|