No. |
Details of Changes/Faults
| Test(s) | What Happens | Errors/Messages | Recovery Options | Best Practices |
1 |
Expandable values in DCU changed. => If you change the expandable flag on a DCU from true to false or vice verse after LDOMs are configured |
expandable property set from true to false.
*note recovery-mode not enabled
|
After a stop/start of HOSTx, the ldom configuration falls back to factory-default |
> WARNING: HOST expandable property must be set to true in order to boot config > WARNING: Unable to boot config_kon3 due to missing resources > WARNING: Falling back to factory-default
> WARNING: Missing guest memory [0x100030000000:0x100830000000]
> WARNING: Missing required memory resources to boot config
> WARNING: Unable to boot 26oct15 due to missing resources
> WARNING: Falling back to factory-default
|
see KM Doc 1640383.1 |
1. Good Planning needed in advance 2. Enable recovery mode for degraded configuration to load |
2 |
Adding CMU to original configuration => If you add CMU(s) to to an existing DCU when LDOMs are already configured |
Added CMU1 to an existing half-populated CMU0 and CMU3 PDOM/DCU 1.ioreconfigure is true or add_only 2. ioreconfigure is false |
1) If ioreconfigure is true or add_only After start /HOSTx, The ldom configuration falls back to factory-default due to missing IO resources. 1a) With recovery mode enabled =>After start /HOSTx, The ldom configuration is booted in degraded mode => root complexes for new CMU are added => root complexes and paths are reprogrammed => cards now associated with previously non-existing root complexes are now "unk" => cards now associated with previously non-existing root complexes are no longer assigned to a specific ldom/guest (add-io required) => See Doc ID 1540545.1 for the list of root complexes and paths for PCI cards 2) If ioreconfigure is false After start /HOSTx, The ldom configuration is booted normally. => root complexes and paths for new CMU are not added nor reprogrammed => all cards are accessible |
1. > WARNING: Missing IO resources to boot LDOM config > WARNING: Unable to boot 13042014 due to missing resources > WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default 2. > NOTICE: Booting config = cmu-1 > DEBUG: Updating mdset-boot-reason prop: "0""cmu-1""cmu-1" > DEBUG: ldm_set_bootedconfig_name: New bootedcfg cmu-1, last bootedcfg factory-default |
1. Set ioreconfigure to true, then remove the extra CMU(s) and start /HOSTx and it should recover * note that, if ioreconfigure is add_only, then removing the CMU(s) will not re-build the paths. |
1. Good Planning needed in advance to either set ioreconfigure to false or prepare for a configuration change. 2. Enable recovery mode for degraded configuration to load 3. If customer wants to permanently add CMUs, then the only way is to plan ahead and redo the ldom configuration. |
3 |
Removing CMU(s) from original configuration ==> If you remove CMU(s) from an existing DCU when LDOMS are already configured |
Removed 2x CMU from a Full 4x CMU PDOM to make it a Half DCU 1.ioreconfigure is false or add_only 2. ioreconfigure is true |
1. If ioreconfigure is add_only or false, After start /HOSTx, the ldom configuration falls back to factory-defaultdue tomissing paths to the removed CMU(s)/CMP(s) 1a) with recovery mode enabled => configuration is booted in degraded mode => root complexes managed by the removed CMU(s)/CMP(s) disappear => no changes or reprogramming to the existing and remaining root complexes => the cards managed by the removed CMU(s)/CMP(s) are also not available => root complexes assigned to the guests are marked IOV 2. if ioreconfigure is true After start /HOSTx, the ldom configuration falls back to factory-default due to change in paths to the root-complexes. 2b) with recovery mode enabled => configuration is booted in degraded mode => root complexes and paths are reprogrammed => the cards managed by the removed CMU(s)/CMP(s) are also not available => root complexes assigned to the guests are marked IOV |
> WARNING: bootconfig not bootable: missing strand id 1280 > WARNING: Missing required strand resources to boot config > WARNING: Unable to boot stef-full due to missing resources > DEBUG: Trying to fall back to degraded config > DEBUG: Can't open degraded cfg "stef-full" - rv = -6 > DEBUG: Degraded config doesn't exist > WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default > DEBUG: Updating mdset-boot-reason prop: "1""stef-full""factory-default" |
1. Set ioreconfigure to true, then add the removed CMU(s) back and start /HOSTx and it should re-build the paths and recover |
1. Good planning needed in advance, which includes steps to re-create ldoms manually if customer wants to remove CMU(s) permanently. 2. Enable recovery mode for degraded configuration to load |
4 |
CMP Failure (root complex) |
Blacklisted CMU15/CMP0 and CMP1 to avoid 7-nodes config and CID > 480 and driving PCIE15 CMP disabled : PCIE15 assigned to ldg1, no cid assignment, only vcpu - 150 out of 384 - 384 < CID < 424 (strandid 3840 below not in use) *note recovery-mode not enabled |
Falls back to factory-default
|
> DEBUG: Strand not present: chip id = 30, smp_id = 3, local_chip_id = 6, local_strand_id= 0 > WARNING: bootconfig not bootable: missing strand id 3840 > WARNING: Missing required strand resources to boot config > WARNING: Unable to boot stef-4 due to missing resources > DEBUG: Trying to fall back to degraded config > DEBUG: Can't open degraded cfg "stef-4" - rv = -6 > DEBUG: Degraded config doesn't exist > WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default > DEBUG: Updating mdset-boot-reason prop: "1""stef-4""factory-default" > DEBUG: bootconfig differs from last boot > DEBUG: ldm_set_bootedconfig_name: New bootedcfg factory-default, last bootedcfg stef-4 |
1. Replace ASAP |
Enable recovery mode for degraded configuration to load |
5 |
PCIE/IOU/CMU/DIMM Failures |
1. Blacklisted IOU3/PCIE5 Original config : ldg1 is using IOU3/PCIE5 entire rootcomplex assigned to ldom 2. Blacklisted IOU3/IOB1 Original config : ldg1 is using IOU3/PCIE15 entire rootcomplex assigned to ldom 3. Blacklisted CMU/CPU 4. Disabled DIMMS *note recovery-mode not enabled |
1. The ldom configuration is booted normally. Only the PCIE resource is missing from the ldom when started. 2. Falls back to factory-default => Missing IO resources to boot LDOM configuration 3. Falls back to factory-default => Missing cpu strand 4. Falls back to factory-default => Missing required memory |
1. > NOTICE: Booting config = stef-4 > DEBUG: Updating mdset-boot-reason prop: "0""stef-4""stef-4" > DEBUG: bootconfig differs from last boot > DEBUG: config stef-4 has 2 IO domains > DEBUG: Not in this IO domain > DEBUG: /SYS/IOU3/PCIE5 marked disabled in MD. Path=/@f80/@1/@0/@8 > DEBUG: Updating stef-4 Control Domain's variables and keystore nodes /SYS/IOU3/PCIE5 PCIE pci_50 ldg1 UNK 2. > DEBUG: Some IO unreachable from cpu nodeset: Degraded IO config. > DEBUG: config_root_io_is_avail: Not enough RCs in the current config > WARNING: Missing IO resources to boot LDOM config > WARNING: Unable to boot stef-4 due to missing resources > DEBUG: Trying to fall back to degraded config > DEBUG: Can't open degraded cfg "stef-4" - rv = -6 > DEBUG: Degraded config doesn't exist > WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default > DEBUG: Updating mdset-boot-reason prop: "1""stef-4""factory-default" > DEBUG: bootconfig differs from last boot > DEBUG: ldm_set_bootedconfig_name: New bootedcfg factory-default, last bootedcfg stef-4 3. > WARNING: bootconfig not bootable: missing strand id 2176 > WARNING: Missing required strand resources to boot config > WARNING: Unable to boot stef-alternate due to missing resources > WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default 4. > WARNING: Missing guest memory [0x150000000000:0x158000000000] > WARNING: Missing required memory resources to boot config > WARNING: Unable to boot stef-mem due to missing resources > DEBUG: Trying to fall back to degraded config > DEBUG: Can't open degraded cfg "stef-mem" - rv = -6 > DEBUG: Degraded config doesn't exist >WARNING: Falling back to factory-default > NOTICE: Booting config = factory-default > DEBUG: Updating mdset-boot-reason prop: "1""stef-mem""factory-default" |
1. Replace ASAP |
Enable recovery mode for degraded configuration to load |
6 |
HDD/Network Card Failures in EMS |
Blacklisted EMS2 Original config : ldg2 is using vnet and vdisk services from EMS2 owned by primary |
==> The ldom configuration is booted normally. The services (vdisk, vnet) are then obviously not available to the ldom |
> NOTICE: Booting config = stef-4 > DEBUG: Updating mdset-boot-reason prop: "0""stef-4""stef-4" > DEBUG: bootconfig differs from last boot > DEBUG: config stef-4 has 2 IO domains > DEBUG: /SYS/IOU3/EMS2 marked disabled in MD. Path=/@1100/@1/@0/@0 > DEBUG: /SYS/IOU3/EMS2 marked disabled in MD. Path=/@1100/@1/@0/@0/@0 > DEBUG: Updating IO MD to offset 0x180006600000 > DEBUG: Not in this IO domain > DEBUG: ldm_set_bootedconfig_name: New bootedcfg stef-4, last bootedcfg factory-default /SYS/IOU3/EMS2/CARD/NET0 PCIE pci_56 primary UNK /SYS/IOU3/EMS2/CARD/SCSI PCIE pci_56 primary UNK
|
1. Replace ASAP |
None |