Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2146310.1
Update Date:2018-03-19
Keywords:

Solution Type  Problem Resolution Sure

Solution  2146310.1 :   High CPU utilization on NM2 InfiniBand Switches  


Related Items
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Sun Network QDR InfiniBand Gateway Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  


This document outlines a workaround for problem seen on NM2 InfiniBand switches with scenario of high CPU utilization

In this Document
Symptoms
Cause
Solution


Applies to:

Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases]
Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

NM2 InfiniBand switches may have slower responses to ssh logins, web logins, diagnostic commands etc.

Enterprise Manager monitoring NM2 InfiniBand switches may report higher than normal CPU utilization i.e. >80% or even at max 100%

Please refer to note 2152997.1 on how to use Enterprise Manager to monitor CPU on InfiniBand switches.

Users logged in shell of IB switch may observe almost 0% idle time in output of top and vmstat. top output sample is provided below

Tasks: 77 total, 2 running, 75 sleeping, 0 stopped, 0 zombie
Cpu(s): 53.6%us, 40.7%sy, 0.7%ni, 0.0%id, 0.0%wa, 4.6%hi, 0.3%si, 0.0%st
Mem: 510124k total, 86420k used, 423704k free, 11784k buffers
Swap: 0k total, 0k used, 0k free, 35232k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5445 root 20 0 6276 2876 1988 S 53.8 0.6 162:04.52 webgo
1819 root 20 0 3736 1376 1108 S 0.7 0.3 1:03.20 whereismaster

On the IB switch, 'netstat -anp | grep webgo' -- the output of this may have a few tcp sessions in CLOSE_WAIT state.

Cause

There is an internal web server software component of ILOM named as 'webgo' which has a potential bug causing high CPU utilization on its management module (Kontron).

This problem is known to get triggered if there is a malformed POST on HTTP or HTTPS service running on ILOM. Qualys security scanner has triggered this internally on test and development environments as well.

Solution

Workaround Only

Until a newer firmware with fixes is installed, the only way to avoid this problem is by disabling HTTP and HTTPS services of ILOM.

Here is how to do it.

1. Login as ilom-admin user OR alternatively login as root and get ILOM shell via 'spsh' command

2. From ILOM shell prompt -> execute the following commands

-> set /SP/services/http secureredirect=disabled
-> set /SP/services/http servicestate=disabled

-> set /SP/services/https servicestate=disabled

-> exit

After executing above steps, user can verify that CPU utilization comes down in normal range which is usually below 70%.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback