Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2028757.1
Update Date:2018-03-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  2028757.1 :   Sun Fire[TM] 12K/15K/E20K/E25K: System Controller Failed To Configure Floating Interface.  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
  • Sun Fire E25K Server
  •  
  • Sun Fire E20K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-11003093311>

Applies to:

Sun Fire 12K Server - Version All Versions to All Versions [Release All Releases]
Sun Fire E25K Server - Version All Versions to All Versions [Release All Releases]
Sun Fire 15K Server - Version All Versions to All Versions [Release All Releases]
Sun Fire E20K Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

MAIN SC (System Controller) can't configure C1 network floating IP, after a SC failover took place because of an external network failure and mand
is reporting an error like: "Failed to configure the logical interface - configFloatingIFs() failed (ecode = 256)"

Cause

This scenario can be caused by an external network issue with the result of confused IPMP groups on both SC0 and SC1,
leading to a SC failover and finally to a failed attempt to configure C1 floating interface.

For more information regarding C1 IPMP network funtionality, kindly refer to the following document:
"Sun Fire[TM] 12K/15K/E20K/E25K: Main System Controller's Community Network (C1) interface failure causes a failover onto the Spare SC. (Doc ID 1006489.1)"

 
Explanation of the scenario in details

Standby SC received MAIN role due to external network failure on active SC:

 fomd[4800]: [8518 19826248349761597 NOTICE FailoverMgr.cc 2531] Failover is in a failed state because of the following fault(s): External Network Failure
 fomd[4800]: [8569 19826423723633425 NOTICE FailoverMgr.cc 1377] The external network test PASSED
 fomd[4800]: [8575 19826474461434681 NOTICE FailoverMgr.cc 2498] Failover activated
 fomd[4800]: [8573 19826708066097189 NOTICE FailoverMgr.cc 2293] Taking over the main role because the remote SC (current Main) has a fault - External Network Failure
 fomd[4800]: [8519 19826708082074747 NOTICE FailoverMgr.cc 2631] Failover deactivated
 fomd[4800]: [8570 19826723260924732 NOTICE FailoverMgr.cc 2356] Reset the remote SC

The previously active SC reported IPMP issues :

 in.mpathd[96]: [ID 594170 daemon.error] NIC failure detected on eri3 of group C1
 in.mpathd[96]: [ID 832587 daemon.error] Successfully failed over from NIC eri3 to NIC eri0
 in.mpathd[96]: [ID 168056 daemon.error] All Interfaces in group C1 have failed
 in.mpathd[96]: [ID 620804 daemon.error] Successfully failed back to NIC eri0
 in.mpathd[96]: [ID 299542 daemon.error] NIC repair detected on eri0 of group C1
 in.mpathd[96]: [ID 237757 daemon.error] At least 1 interface (eri0) of group C1 has repaired
 in.mpathd[96]: [ID 832587 daemon.error] Successfully failed over from NIC eri3 to NIC eri0

before it got forced to reboot due to the reset of the standby SC

 genunix: [ID 672855 kern.notice] syncing file systems...
 genunix: [ID 904073 kern.notice] done
 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.x Version Generic 64-bit

The same IPMP issue is reported by the standby SC while attempting to take the MAIN role:

 in.mpathd[97]: [ID 594170 daemon.error] NIC failure detected on eri0 of group C1
 in.mpathd[97]: [ID 832587 daemon.error] Successfully failed over from NIC eri0 to NIC eri3

 in.mpathd[97]: [ID 168056 daemon.error] All Interfaces in group C1 have failed
 in.mpathd[97]: [ID 620804 daemon.error] Successfully failed back to NIC eri0

 in.mpathd[97]: [ID 299542 daemon.error] NIC repair detected on eri0 of group C1
 in.mpathd[97]: [ID 237757 daemon.error] At least 1 interface (eri0) of group C1 has repaired
 in.mpathd[97]: [ID 620804 daemon.error] Successfully failed back to NIC eri3
 in.mpathd[97]: [ID 299542 daemon.error] NIC repair detected on eri3 of group C1

 As a result of the permanent switching C1 group interfaces, the floating IP could not be configured during SMS reconfiguration, but SC reached MAIN role.
 The following error appears in /var/opt/SUNWSMS/adm/platform/messages::

 mand[4735]: [7744 19826804578644722 ERR MandFailoverService.cc 469] /sbin/sh -c 'ifconfig eri0 addif <FLOATER IP> netmask + broadcast + failover up>/dev/null 2>&1' failed. Failed to configure floating interface.
 mand[4735]: [7744 19826805004930465 ERR MandFailoverService.cc 469] /sbin/sh -c 'ifconfig eri3 addif <FLOATER IP> netmask + broadcast + failover up>/dev/null 2>&1' failed. Failed to configure floating interface.
 fomd[4800]: [8563 19826805144325510 WARNING FOConfig.cc 183] Failed to configure the logical interface - configFloatingIFs() failed (ecode = 256)
 fomd[4800]: [8576 19826807763796419 NOTICE FailoverMgr.cc 2481] SC configured as Main

and the second virtual interface eri0:2 ( floating interface ) is missing; only eri0:1 as the IPMP failover interface exists which can be seen in "ifconfig -a" from actual MAIN SC.
 

Solution

The C1 floating IP has not been configured during SMS startup on standby SC, because the underlying IPMP group C1 was not stable.

It is necessary to check the external network and to resolve all issues there, especially the default router needs to be checked, because it is the IPMP test interface.

If all external network issues have been resolved, the SMS restart on MAIN SC ( the one which took over the role recently ) should plumb the eri0:2 interface as expected
with the floating IP configured for that platform as configured in MAN.cf file by setupplatform .

Course of actions to resolve:

The suggested action plan does not need an outage for any running domain(s).


Turn off SC failover and confirm that failover is off

sms-svc%  setfailover off
sms-svc%  showfailover
SC Failover Status:     DISABLED

As user root stop SMS on the MAIN SC , wait a minute and start it again

#  /etc/init.d/sms stop
<wait>
#  /etc/init.d/sms start

 <wait>   until MAIN role has been established again

sms-svc% showfailover -r

verify the result:

# ifconfig -a

  

References

<NOTE:1629452.1> - Sun Fire[TM] 12K/15K/E20K/E25K: System Controller's Community Network (C Network) and a typical IPMP HA configuration
<NOTE:1006489.1> - Sun Fire[TM] 12K/15K/E20K/E25K: Main System Controller's Community Network (C1) interface failure causes a failover onto the Spare SC.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback