The documentation set for this product strives to use bias-free language. For the purposes of this documentation set, bias-free is defined as language that does not imply discrimination based on age, disability, gender, racial identity, ethnic identity, sexual orientation, socioeconomic status, and intersectionality. Exceptions may be present in the documentation due to language that is hardcoded in the user interfaces of the product software, language used based on RFP documentation, or language that is used by a referenced third-party product. Learn more about how Cisco is using Inclusive Language.
Book Contents Book ContentsCisco UCS Manager Troubleshooting Reference Guide
This chapter includes the following sections:
Troubleshooting Cisco UCS Manager Initial Configuration
You can verify that both fabric interconnect configurations are complete by logging into the fabric interconnect via SSH and verifying the cluster status through CLI. For this procedure, you can watch Cisco UCS Manager Initial Setup part 3.
Use the following commands to verify the cluster state:
show cluster state
Displays the operational state and leadership role for both fabric interconnects in a high availability cluster.
The following example displays that both fabric interconnects are in the Up state, HA is in the Ready state, fabric interconnect A has the primary role, and fabric interconnect B has the subordinate role:
UCS-A# show cluster state Cluster Id: 0x4432f72a371511de-0xb97c000de1b1ada4 A: UP, PRIMARY B: UP, SUBORDINATE HA READY
show cluster extended-state
Displays extended details about the cluster state and typically used when troubleshooting issues.
UCSC# show cluster extended-state 0x2e95deacbd0f11e2-0x8ff35147e84f3de2Start time: Thu May 16 06:54:22 2013Last election time: Thu May 16 16:29:28 2015System Management Viewing the Cluster State A: UP, PRIMARY B: UP, SUBORDINATE A: memb state UP, lead state PRIMARY, mgmt services state: UP B: memb state UP, lead state SUBORDINATE, mgmt services state: UP heartbeat state PRIMARY_OK HA READY Detailed state of the device selected for HA quorum data: Device 1007, serial: a66b4c20-8692-11df-bd63-1b72ef3ac801, state: active Device 1010, serial: 00e3e6d0-8693-11df-9e10-0f4428357744, state: active Device 1012, serial: 1d8922c8-8693-11df-9133-89fa154e3fa1, state: active
Troubleshooting Boot Issues
Problem—The system fails to produce a reboot warning that lists any dependencies.
Possible Cause—This problem can be caused by changes to a vNIC template or a vHBA template. Reboot warnings occur when the back-end returns a list of dependencies. When you update the template type for a vNIC or vHBA template and make changes to any boot-related properties without applying changes between steps, the back-end systems are not triggered to return a list of dependencies.
Launch the Cisco UCS Manager GUI.
In the vNIC template or vHBA template included in the service profile, do the following:
Make any additional changes to the reboot-related values and click Save Changes .
A reboot warning and the list of dependencies are displayed.
Problem—The eUSB embedded inside the Cisco UCS server includes an operating system. However, the server does not boot from that operating system.
Possible Cause—This problem can occur when, after associating the server with the service profile, the eUSB is not at the top of the actual boot order for the server.
Launch the Cisco UCS Manager GUI.
On Servers , do the following to verify the boot policy configuration:
On Equipment , do the following to verify the actual boot order for the server:
If the eUSB is not the first device in the actual boot order, do the following:
Problem—The server does not boot from the operating system after a RAID1 cluster migration. The RAID LUN remains in “inactive” state during and after service profile association. As a result, the server cannot boot.
Possible Cause—This problem can occur if the local disk configuration policy in the service profile on the server is configured with Any Configuration mode rather than RAID1.
In Cisco UCS Manager GUI , click Servers .
Navigate to the service profile associated with the server and click the Storage tab.
Do one of the following:
Troubleshooting KVM Issues
Problem—The BadFieldException error appears when the KVM viewer is launched.
Possible Cause—This problem can occur because the Java Web Start disables the cache by default when it is used with an application that uses native libraries.
Choose Start > Control Panel > Java .
Click on the General tab.
In the Temporary Internet Files area, click Settings .
Click the Keep temporary files on my computer check box.
Problem—The KVM console fails to launch and the JRE displays the following message:
Unable to launch the application.
Possible Cause—This problem can be caused if several KVM consoles are launched simultaneously.
If possible, close all of the open KVM consoles.
Relaunch the KVM consoles one at a time.
Problem—The first time you attempt to open the KVM on a server, the KVM fails to launch.
Possible Cause—This problem can be caused by a JRE version incompatibility.
Upgrade to JRE 1.6_11.
Reboot the server.
Launch the KVM console.
Troubleshooting VM issues
Problem—The following error displays:
Currently connected network interface x uses Distributed Virtual Switch (uusid:y) which is accessed on the host via a switch that has no free ports.
Possible Cause—This problem can be caused by one of the following issues:
Identify what you were doing when the error displayed.
If the error resulted from powering off a VM, or from migrating a VM from one host to another, do the following:
If the error resulted from migrating a VM instance from one data-store to another data-store on the same server, do the following:
Troubleshooting Cisco UCS Manager Issues
Problem—When you run Cisco UCS Manager CLI commands, Cisco UCS Manager CLI displays the following message:
Software Error: Exception during execution: [Error: Timed out communicating with DME]
Possible Cause—This problem occurs when the DME process on the primary fabric interconnect is either unresponsive or has crashed, and is not in the running state. Other symptoms that appear when the DME is down are:
Gather information on the sequence of events, such as upgrade of Cisco UCS Manager and configuration changes, that lead the system to this state.
Connect to each fabric interconnect by using its individual IP address, and verify the cluster status, process and core dumps by using the following commands:
Identify the primary fabric interconnect, and whether HA election is incomplete.
Review NXOS logs for fabric interconnect hardware issues by using the following commands:
Collect technical support information for Cisco UCS Manager from local-mgmt CLI by using the following commands:
Contact TAC with these logs and information to further investigate the failure.
Problem—After coming back from sleep mode, the Cisco UCS Manager GUI displays the following message:
Fatal error: event sequencing is skewed.
Possible Cause—This problem can be caused if the Cisco UCS Manager GUI was running when the computer went to sleep. Since the JRE does not have a sleep detection mechanism, the system is unable to retrack all of the messages received before it went into sleep mode. After multiple retries, this event sequencing error is logged.
Always shut down Cisco UCS Manager GUI before putting your computer to sleep.
In Cisco UCS Manager GUI, if a Connection Error dialog box is displayed, click one of the following:
Troubleshooting Fabric Interconnect Issues
If the fabric interconnect fails to start, you may have one of the following issues:
If either of these issues exist, you might need to use the boot loader prompt to recover the fabric interconnect.
Contact Cisco Technical Assistance Center to obtain the firmware recovery images and information about how to recover the fabric interconnect from the boot loader prompt.
Problem—When you set up two fabric interconnects to support a high availability cluster and connect the L1 ports and L2 ports, a fabric interconnect cluster ID mismatch can occur. This type of mismatch means that the cluster fails and Cisco UCS Manager cannot be initialized.
In Cisco UCS Manager CLI, connect to fabric interconnect B and execute erase configuration .
All configuration on the fabric interconnect is erased.
Reboot fabric interconnect B.
After rebooting, fabric interconnect B detects the presence of fabric interconnect A and downloads the cluster ID from fabric interconnect A. You need to configure the subordinate fabric interconnect for the cluster configuration.
When the unconfigured system boots, it prompts you for the setup method to be used. Enter console to continue the initial setup using the console CLI.
Note | The fabric interconnect should detect the peer fabric interconnect in the cluster. If it does not, check the physical connections between the L1 and L2 ports, and verify that the peer fabric interconnect has been enabled for a cluster configuration. |
Enter y to add the subordinate fabric interconnect to the cluster.
Enter the admin password of the peer fabric interconnect.
Enter the IP address for the management port on the subordinate fabric interconnect.
Review the setup summary and enter yes to save and apply the settings, or enter no to go through the Setup wizard again to change some of the settings.
If you choose to go through the Setup wizard again, it provides the values you previously entered, and the values appear in brackets. To accept previously entered values, press Enter .
Troubleshooting Server Disk Drive Detection and Monitoring
The type of monitoring supported depends upon the Cisco UCS server.
Through Cisco UCS Manager , you can monitor local storage components for the following servers:
Not all servers support all local storage components. For Cisco UCS rack servers, the onboard SATA RAID 0/1 controller integrated on motherboard is not supported.
Only legacy disk drive monitoring is supported through Cisco UCS Manager for the following servers:
In order for Cisco UCS Manager to monitor the disk drives, the 1064E storage controller must have a firmware level contained in a UCS bundle with a package version of 2.0(1) or higher.
These prerequisites must be met for local storage monitoring or legacy disk drive monitoring to provide useful status information:
Viewing the Status of a Disk Drive
In the Navigation pane, click Equipment .
Expand Equipment > Chassis > Chassis Number > Servers .
Click the server for which you want to view the status of your local storage components.
In the Work pane, click the Inventory tab.
Click the Storage subtab to view the status of your RAID controllers and any FlexFlash controllers.
Click the down arrows to expand the Local Disk Configuration Policy , Actual Disk Configurations , Disks , and Firmware bars and view additional status information.
Cisco UCS Manager displays the following properties for each monitored disk drive:
You need to look at both properties to determine the status of the monitored disk drive. The following table shows the likely interpretations of the combined property values.
No fault condition. The disk drive is in the server and can be used.
Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:
Fault condition. The server drive bay does not contain a disk drive.
Fault condition. The disk drive is in the server, but one of the following could be causing an operability problem:
The Operability field might show the incorrect status for several reasons, such as if the disk is part of a broken RAID set or if the BIOS power-on self-test (POST) has not completed.
Problem—After hot-swapping, removing, or adding a hard drive, the updated hard disk drive (HDD) metrics do not appear in the Cisco UCS Manager GUI.
Possible Cause—This problem can be caused because Cisco UCS Manager gathers HDD metrics only during a system boot. If a hard drive is added or removed after a system boot, the Cisco UCS Manager GUI does not update the HDD metrics.
Reboot the server.
Problem—The fault LED is illuminated or blinking on the server disk drive, but Cisco UCS Manager does not indicate a disk drive failure.
Possible Cause—The disk drive fault detection tests failed due to one or more of the following conditions:
Monitor the fault LEDs of each disk drive in the affected server(s).
If a fault LED on a server turns any color, such as amber, or blinks for no apparent reason, create technical support file for each affected server and contact Cisco Technical Assistance Center .
Problem— Cisco UCS Manager reports that a server has more disks than the total disk slots available in the server. For example, Cisco UCS Manager reports three disks for a server with two disk slots as follows:
RAID Controller 1: Local Disk 1: Product Name: 73GB 6Gb SAS 15K RPM SFF HDD/hot plug/drive sled mounted PID: A03-D073GC2 Serial: D3B0P99001R9 Presence: Equipped Local Disk 2: Product Name: Presence: Equipped Size (MB): Unknown Local Disk 5: Product Name: 73GB 6Gb SAS 15K RPM SFF HDD/hot plug/drive sled mounted Serial: D3B0P99001R9 HW Rev: 0 Size (MB): 70136
Possible Cause—This problem is typically caused by a communication failure between Cisco UCS Manager and the server that reports the inaccurate information.
Upgrade the Cisco UCS domain to the latest release of Cisco UCS software and firmware.
Decommission the server.
Recommission the server.
Troubleshooting Post-Upgrade IQN Issues
Problem—After an upgrade from Cisco UCS , Release 2.0(1) to Release 2.0(2), Cisco UCS Manager raises an IQN-related fault on one or more service profiles when you attempt to perform an action on a service profile, such as modifying the host firmware package.
Possible Cause—One or more iSCSI vNICS used within a single service profile or across multiple service profiles did not have a unique IQN initiator name.
Log into the Cisco UCS Manager CLI .
Run the following command to view a list of the IQNs in the Cisco UCS domain :
UCS-A# show identity iqn | include iqn name
In Cisco UCS PowerTool , run the script to identify the iSCSI vNICs which include the duplicate IQNs.
In the service profile to which the IQN initiator name is not registered, change the initiator identity to the default IQN pool or manually assign a unique IQN.
In the service profile in which you changed the initiator identity, change the initiator assignment to the name or pool you assigned, as follows:
Note | This vNIC is not registered or visible through show identity iqn . |
Changing initiator names also involves storage side configuration, which is beyond the scope of this document.
Perform an action on the service profile to register the initiator names in the Cisco UCS database.
For example, you can upgrade the firmware on the associated server or modify the description or label of the service profile.
Run the following command to verify that the IQN changes were registered:
UCS-A show identity iqn | include iqn name
If a Cisco UCS domain is configured for iSCSI boot, before you upgrade from Cisco UCS , Release 2.0(1) to Cisco UCS , Release 2.0(2) or higher, you must ensure that all iSCSI vNICs used across multiple service profile have unique initiator names.
You can use a script that runs in the Cisco UCS PowerTool to determine whether a Cisco UCS configuration for iSCSI boot includes duplicate IQNs.
To download Cisco UCS PowerTool , do the following:
To launch Cisco UCS PowerTool , enter the following at a command line:
C:\Program Files (x86)\Cisco\Cisco UCS PowerTool>C:\Windows\System32\windowspowe rshell\v1.0\powershell.exe -NoExit -ExecutionPolicy RemoteSigned -File .\StartUc sPS.ps1
The following example shows what happens when you launch Cisco UCS PowerTool :
C:\Program Files (x86)\Cisco\Cisco UCS PowerTool>C:\Windows\System32\windowspowershell\v1.0\powershell.exe -NoExit -ExecutionPolicy RemoteSigned -File .\StartUcsPS.ps1 Windows PowerShell Copyright (C) 2009 Microsoft Corporation. All rights reserved.
In Cisco UCS PowerTool , do the following:
cmdlet Connect-Ucs at command pipeline position 1 Supply values for the following parameters: Credential
Cisco UCS PowerTool outputs the following to your screen after you log in.
Cookie : 1331303969/2af0afde-6627-415c-b85f-a7cae6233de3 Domains : LastUpdateTime : 3/9/2012 6:20:42 AM Name : 209.165.201.15 NoSsl : False NumPendingConfigs : 0 NumWatchers : 0 Port : 443 Priv : RefreshPeriod : 600 SessionId : web_49846_A TransactionInProgress : False Ucs : ucs-4 Uri : https://209.165.201.15 UserName : admin VirtualIpv4Address : 209.165.201.15 Version : 2.0(2i)3.0(1a) WatchThreadStatus : None
In the Cisco UCS PowerTool , run the following script to validate your iSCSI boot configuration and check for duplicate IQNs :
Cisco UCS PowerTool outputs the results to your screen, as follows:
Count InitiatorName Dn ----- ------------- -- 2 iqn.2012-01.cisco.com:s.
(Optional) If you have .NET Frame work 3.5 Service Pack 1 installed, you can use the following script to view the output in the GUI:
Disconnect from Cisco UCS Manager , as follows:
PS C:\> Disconnect-Ucs
If duplicate IQNs exist across multiple service profiles in the Cisco UCS domain , reconfigure the iSCSI vNICs with unique IQNs in Cisco UCS Manager before you upgrade to Cisco UCS , Release 2.1 or greater.
If you do not ensure that all iSCSI vNICs are unique across all service profiles in a Cisco UCS domain before you upgrade, Cisco UCS Manager raises a fault on the iSCSI vNICs to warn you that duplicate IQNs are present. Also, if you do not ensure that there are no duplicate IQN names within a service profile (for example, the same name used for both iSCSI vNICs), Cisco UCS reconfigures the service profile to have a single IQN. For information on how to clear this fault and reconfigure the duplicate IQNs, see the Cisco UCS B-Series Troubleshooting Guide.
Problem—After an upgrade from Cisco UCS , Release 2.0(1) to Release 2.0(2), Cisco UCS Manager raises an IQN-related fault on one or more service profiles and you cannot reconfigure the duplicate IQN initiator name on the service profile.
Possible Cause—The service profile that does not have a unique IQN initiator name is based on an updating service profile template.
Log into the Cisco UCS Manager CLI .
UCS-A # scope org org-name
Enters organization mode for the specified organization. To enter the root organization mode, type / as the org-name .
UCS-A /org # scope service-profile profile-name
Enters service profile organization mode for the service profile.
UCS-A/org# scope vnic-iscsi iscsi_vnic1_name
Enters the mode for the first iSCSI vNIC assigned to the service profile.
UCS-A /org/service-profile/vnic-iscsi* # set iscsi-identity
Specifies the name of the iSCSI initiator or the name of an IQN pool from which the iSCSI initiator name will be provided. The iSCSI initiator name can be up to 223 characters.
UCS-A /org/service-profile/vnic-iscsi* # exit
Exits the mode for the specified iSCSI vNIC
UCS-A/org# scope vnic-iscsi iscsi_vnic2_name
Enters the mode for the second iSCSI vNIC assigned to the service profile.
UCS-A /org/service-profile/vnic-iscsi* # set iscsi-identity
Specifies the name of the iSCSI initiator or the name of an IQN pool from which the iSCSI initiator name will be provided. The iSCSI initiator name can be up to 223 characters.
UCS-A /org/service-profile/vnic-iscsi # commit-buffer
Commits the transaction to the system configuration.
In the Cisco UCS Manager GUI , unbind the service profile from the updating service profile template.
Troubleshooting Issues with Registering Cisco UCS Domains in Cisco UCS Central
Date and time mismatch is the most common issue with registration.
To ensure that the date and time between Cisco UCS Central and Cisco UCS domains are in sync, try the following:
UCSC # connect policy-mgr UCSC(policy-mgr)# scope org UCSC(policy-mgr) /org# scope device-profile UCSC(policy-mgr) /org/device-profile # scope security UCSC(policy-mgr) /org/device-profile/security # scope keyring default UCSC(policy-mgr) /org/device-profile/security/keyring* # set regenerate yes UCSC(policy-mgr) /org/device-profile/security/keyring* #commit-buffer
UCSM# scope system UCSM /system # scope control-ep policy UCSM /system/control-ep # set shared-secret Shared Secret for Registration: UCSM /system/control-ep* # commit-buffer
Before calling Cisco TAC, make sure that: