Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1002408.1
Update Date:2017-10-11
Keywords:

Solution Type  Technical Instruction Sure

Solution  1002408.1 :   Sun Fire[TM] 12K/15K: showdevices fails with "dcs: <11989> resource info init error (1)"  


Related Items
  • Sun Fire 15K Server
  •  
  • Sun Fire E20K Server
  •  
  • Sun Fire E25K Server
  •  
  • Sun Fire 12K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • _Old GCS Categories>Sun Microsystems>Boards>Sun Fire Link
  •  

PreviouslyPublishedAs
203371


Applies to:

Sun Fire 12K Server - Version All Versions and later
Sun Fire 15K Server - Version All Versions and later
Sun Fire E20K Server - Version All Versions and later
Sun Fire E25K Server - Version All Versions and later
All Platforms

Goal

showdevices gathers device information from one or more Sun Fire[TM] high-end system domains. The command uses dca(1M) as a proxy to gather the information from the domains.

This article discusses a specific condition where "showdevices -d <domain-tag>" command would fail as follows :

v4u-15ka-sc1:sms-svc:24> showdevices -d A
Unable to get device information from domain A

However, besides the above anomaly, all other interactions between the main system controller (SC) and the domain proceeds as expected:

SC -- rcfgadm -d <domain-tag> proceeds successfully
DR ops initiated from the SC -- addboard and/or deleteboard proceeds successfully
DR ops initiated from the SC -- rcfgadm -c disconnect and/or rcfgadm -c configure proceeds successfully

 

Solution

 

The following messages logs were observed from the "showdevices" failure:

Domain OS's /var/adm/messages

     Sep  8 12:27:27 v4u-15ka-a dcs: [ID 985267 daemon.error] <12073> resource info init error (1)

SMS platform logs

     Sep  8 12:27:05 2005 v4u-15ka-sc1 showdevices[18788]: [0 3638613703618192 ERR ri_init.cc 85] rcfgaRequestProxy->ri_init failed. status= 4318

SMS domain logs

     Sep  8 12:27:04 2005 v4u-15ka-sc1 dca[7635]-A(): [4318 3638612328642665 ERR DCSInterface.cc 378] message receive failed: DCSInterface :: receiveResponse errCode:500

The above condition persists in spite of the fact that the DCA service on the SC and the DCS service on the domain OS have been terminated and then restarted. In fact, the problem persisted in spite of the fact that domain was keyswitched off/on.

The following observations were made through the course of further investigations.

Applying truss to the DCS service on the domain OS in parallel with executing "showdevices" from the main SC revealed the following:

After resolving the symbolic link of a specific path-name involving "/dev/wci17d" using resolvepath():

     12105:  resolvepath("/dev/wci17d", "/devices/SUNW,wci@17d,0:SUNW,wci-rsm", 1024) = 36

It was not able to successfully refer to the database of /devices to /dev mappings (i.e. /dev/.devlink_db) to complete the resolution of the device to its /devices mapping.

Given the above observations, it is possible to conclude that the problem encountered stems from the fact that the /devices -- /dev mappings were not populated as expected:

     # strings .devlink_db | grep -i wci
/SUNW,wci@17d,0
SUNW,wci-rsm
wci17d
../devices/SUNW,wci@17d,0:SUNW,wci-rsm

Due to the fact that WCI Remote Shared Memory device driver (i.e. wrsm) had not been modload-ed to the kernel and resulting device node being placed into a "not attached" state:

     SUNW,wci (driver not attached)

Hence, the "showdevices" anomaly can be addressed by ensuring that the "wrsm" device driver is loaded on the domain OS bootup. To do this, run devfsadm with no options as root, or perform a reconfiguration boot (boot -r).




Internal Comments

see CR 4530555.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback