Ran into a networking-related issue the other day, and while it wasn’t anything new or necessarily exciting, it taught me a lesson about cut-through switching vs store and forward switching. Sharing this one for my fellow non-Network engineers:
Realized that (from the Server Engineer perspective) we were seeing frequent packet receive CRC errors/drops on multiple server interfaces (ESXi hosts, HP virtual connects) in a particular datacenter row. In this case, these devices were connected to Nexus 5Ks, either directly or via Nexus 2Ks.
Depending on the network switch, it can either use the store and forwarding method or the cut-through method to forward frames. I believe Cisco Catalyst switches and Cisco Nexus 7000 series typically use store and forwarding. I believe the Cisco Nexus 5000 and 2000 series use cut-through. (Network engineers: please correct me if that is incorrect)
Store and Forwarding – When there are CRC errors generated, ie. from a device network interface with a bad cable, this corruption will get detected before the frames are sent and would not get propagated across the network or “switching domain”.
Cut-Through – When there are CRC errors generated, ie. from a device network interface with a bad cable, these corrupted frames may get propagated because cut-through switches aren’t able to verify the integrity of an incoming packet before forwarding it. This is good because it reduces switching latency, but it can cause CRC errors to “spread like the plague”
Worked with a friendly Network Engineer to identify the device and interface that was the source of these propagating errors, and then replaced the cable.
There were a couple of good sources of info that were very helpful for this one:
- Past Cisco Live sessions on Nexus 5K/2K troubleshooting that are available for download