OTV + Nexus 7000 + “lost RARPs bug” causes vMotioned VMs to briefly lose connectivity

November 6, 2014 — 3 Comments

For those with networks that use Cisco OTV with Nexus 7Ks to extend Layer 2 connectivity between sites, be aware that there is a bug (CSCuq54506) that may cause brief network connectivity issues for VMs that are vMotioned between the sites. The first symptom you may notice is that a VM appears to drop ping or lose connectivity for almost 1-2 minutes after it is vMotioned between sites.  Following a vMotion, a destination ESXi host will send RARP traffic to notify switches and update the MAC tables. When this bug occurs, the RARP traffic/updates basically don’t make it to all of the switches at the source site.  (Note: Since not having portfast enabled on the source or destination host switch ports can cause symptoms that may look a bit similar, it’s a good idea to confirm portfast is enabled on all of the ports.)

Troubleshooting that can be used from the VMware side to help identify if you are hitting the bug:

  • Start running two continuous pings to a test VM:  one continuous ping from the site you are vMotioning from, and one continuous ping from the site you are vMotioning to.
  • vMotion the test VM from one site to the other.
  • If you see the continuous ping at the source site (site VM was vMotioned from) drop for 30-60 seconds, but the continuous ping at the destination site (site VM was vMotioned to) stays up or only drops a ping packet, then you may want to work with Cisco TAC to determine if the root cause is this bug.

 

Advertisements

3 responses to OTV + Nexus 7000 + “lost RARPs bug” causes vMotioned VMs to briefly lose connectivity

  1. 

    Hi,

    Did you resolve the issue? Have you contacted TAC.
    Our customer has the same issue with Nexus7700 C7706 and F3 cards and NX-OS 6.2.10. We have opened a TAC case. But your info could be really valuable if you have also opened a TAC case and resolved the issue. Let me know.

    Regards,
    Laurent

  2. 

    Hi Laurent, Sorry for the delayed response. Please reach out to @cjordanVA on Twitter for details on this one. Thanks!

  3. 

    Hi Laurent, We have similar issue , but we are running 6.2(6). But Cisco recommend to upgrade to 6.2(10).

    But if you faced this same issue even in 6.2(10) , what would be the fix for this ?

    Kindly advice

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s