Monday, February 5, 2018

Xen Server Switch Port is on Error Disable Mode


Our network environment is completely supported by Cisco switches from 2960, 4500, 3850 ,etc. Virtual environment is using Citrix Xen and Vmware products.

Starting from a couple of months ago ,after Xen environment upgraded to 7.2, we are facing switch port err-disable issue.



A couple of ports on different Cisco 2960s, 2960x switches are constantly getting into err-disable mode which cased server outages. It has been happened to both bonded ports and access ports which has no nic teaming or bonding.

SW-MGMT1#sh int g0/12
GigabitEthernet0/12 is down, line protocol is down (err-disabled) 
  Hardware is Gigabit Ethernet, address is 189c.5d6b.3c0c (bia 189c.5d6b.3c0c)
  Description: l-2 10.9.12.27
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Auto-duplex, Auto-speed, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 3w3d, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     1901183 packets input, 485681442 bytes, 0 no buffer
     Received 23 broadcasts (6 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 6 multicast, 0 pause input
     0 input packets with dribble condition detected
     4523326 packets output, 1258124208 bytes, 0 underruns
     0 output errors, 0 collisions, 4 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out


SW-MGMT1#sh ver
Cisco IOS Software, C2960S Software (C2960S-UNIVERSALK9-M), Version 12.2(55)SE7, RELEASE SOFTWARE (fc1)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Mon 28-Jan-13 10:28 by prod_rel_team
Image text-base: 0x00003000, data-base: 0x01B00000

ROM: Bootstrap program is Alpha board boot loader
BOOTLDR: C2960S Boot Loader (C2960S-HBOOT-M) Version 12.2(55r)SE, RELEASE SOFTWARE (fc1)

SW-FWMGMT1 uptime is 13 weeks, 2 days, 21 hours, 31 minutes
System returned to ROM by power-on
System restarted at 15:18:10 EDT Fri Nov 3 2017
System image file is "flash:/c2960s-universalk9-mz.122-55.SE7/c2960s-universalk9-mz.122-55.SE7.bin"


This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
[email protected]

cisco WS-C2960S-24TS-S (PowerPC) processor (revision J0) with 131072K bytes of memory.
Processor board ID FOC1647V1LN
Last reset from power-on
1 Virtual Ethernet interface
1 FastEthernet interface
26 Gigabit Ethernet interfaces
The password-recovery mechanism is enabled.

512K bytes of flash-simulated non-volatile configuration memory.
Base ethernet MAC Address       : 18:9C:5D:6B:3C:00
Motherboard assembly number     : 73-12423-09
Power supply part number        : 341-0328-03
Motherboard serial number       : FOC1647101J
Power supply serial number      : DCA1644M7WA
Model revision number           : J0
Motherboard revision number     : A0
Model number                    : WS-C2960S-24TS-S
Daughterboard assembly number   : 73-11933-04
Daughterboard serial number     : FOC16467T1G
System serial number            : FOC1647V1LN
Top Assembly Part Number        : 800-32448-04
Top Assembly Revision Number    : B0
Version ID                      : V04
CLEI Code Number                : COMGJ00ARD
Daughterboard revision number   : A0
Hardware Board Revision Number  : 0x01


Switch Ports Model              SW Version            SW Image                 
------ ----- -----              ----------            ----------               
*    1 26    WS-C2960S-24TS-S   12.2(55)SE7           C2960S-UNIVERSALK9-M     



There are two switch ports connecting to Citrix Xen server on this SW-Mgmt 2960x. Another port seems fine. And it always g0/12 got err-disable status.

Based on Citrix post : https://discussions.citrix.com/topic/391523-after-upgrade-70-to-72-load-cisco-switch-ports-shutdown-due-to-err-disabled/

"A loopback error occurs when the keepalive packet is looped back to the port that sent the keepalive. The switch sends keepalives out all the interfaces by default.
A device can loop the packets back to the source interface, which usually occurs because there is a logical loop in the network that the spanning tree has not blocked.
The source interface receives the keepalive packet that it sent out, and the switch disables the interface (errdisable).
This message occurs because the keepalive packet is looped back to the port that sent the keepalive:
%PM-4-ERR_DISABLE: loopback error detected on Gi4/1, putting Gi4/1 in err-disable state
Keepalives are sent on all interfaces by default."

A switch port can end be error disabled if the software (IOS or CatOS) detects an error situation on the port. The port is effectively shut down until re-enabled manually or automatically if a recovery timer is specified for the error condition.


One such error condition results from the existence of a loopback on the port. The switch sends keepalive packets out all interfaces. If a keepalive packet is received on the same interface it was sent from, then a loop exists that has not been blocked by Spanning-Tree Protocol. If this occurs, these messages are generated


When it happens, the logs show there is loopback error detected.


000305: Nov  5 21:23:59.177 EST: %ETHCNTR-3-LOOP_BACK_DETECTED: Loop-back detected on GigabitEthernet0/12.

000306: Nov  5 21:23:59.177 EST: %PM-4-ERR_DISABLE: loopback error detected on Gi0/12, putting Gi0/12 in err-disable state

000307: Nov  5 21:24:00.179 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/12, changed state to down

000308: Nov  5 21:24:01.186 EST: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to down


Solutions:

1. Quick / temp fix
The quick fix is easy , just shutdown interface and no shutdown again.

SW-MGMT(config)#int g0/12
SW-MGMT(config-if)#shu
SW-MGMT(config-if)#
SW-MGMT(config-if)#no shu
SW-MGMT(config-if)#


005014: Feb  5 11:55:13.089 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:interface GigabitEthernet0/12 
005015: Feb  5 11:55:18.594 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:shutdown 
005016: Feb  5 11:55:19.863 EST: %PARSER-5-CFGLOG_LOGGEDCMD: User:admin  logged command:no shutdown 
005018: Feb  5 11:55:24.980 EST: %LINK-3-UPDOWN: Interface GigabitEthernet0/12, changed state to up
005019: Feb  5 11:55:25.982 EST: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet0/12, changed state to up


SW-MGMT#sh int g0/12
GigabitEthernet0/12 is up, line protocol is up (connected) 
  Hardware is Gigabit Ethernet, address is 189c.5d6b.3c0c (bia 189c.5d6b.3c0c)
  Description: l-2 10.9.12.27
  MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec, 
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input never, output 00:00:00, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 1000 bits/sec, 1 packets/sec
  5 minute output rate 1000 bits/sec, 1 packets/sec
     1901190 packets input, 485682616 bytes, 0 no buffer
     Received 24 broadcasts (6 multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 6 multicast, 0 pause input
     0 input packets with dribble condition detected
     4523338 packets output, 1258126740 bytes, 0 underruns
     0 output errors, 0 collisions, 5 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out



2. Permanent Fix 
So far, there is no official patch / fix from Citrix end. As to Cisco solution, there are two different ways to fix this :

2.1 Disable Keepalive on specific interface(s)

Issue the no keepalive interface command in order to disable keepalive packets on those interfaces. A disablement of the keepalive prevents errdisable of the interface, but it does not remove the loop. The problem is you will need to collect those ports which issue happened before.

2.2 Enable an auto-recovery once this type of error-disabled issue happened
In this way, you will not need to know which port has issue. It is global command on the switch and affect all ports.

errdisable recovery interval 600
errdisable recovery cause link-flap
errdisable recovery cause udld
errdisable recovery cause bpduguard
errdisable recovery cause loopback
errdisable recovery cause psecure-violation
errdisable recovery cause dcbx-error
errdisable recovery cause pause-rate-limit
errdisable recovery cause inline-power







No comments:

Post a Comment