DEV Community

Nick Schmidt
Nick Schmidt

Posted on • Originally published at blog.engyak.co on

Mellanox `nmlx5_core` driver `4.23` issues on ESXi 8.0 Update 1

Mellanox Driver Overview

Problem Inventory - Mellanox Driver Update on ESXi 8.0u1 causing network virtualization issues

After installing ESXi 8.0 Update 1, some issues start to appear with affected nmlx5_core adapters:

  • Delayed / Failed IP discovery on VLAN-backed segments, even within the same host. Once in the ARP cache, no issues persist
  • Delayed / Failed IP discovery, IP allocation failures on VLAN trunked port-groups, even within the same host. Issues persist even after IP discovery is established
  • Overlay encapsulation offload failures:
    • ICMP with any payload size will function bidirectionally via Edge Transport Nodes / FRRLinux machines, but TCP and UDP will not
    • All overlay traffic encapsulated by a vSphere host flows correctly between workloads on the sane NSX overlay segment
    • All overlay traffic encapsulated by a vSphere host flows correctly between segments on the same NSX distributed router

These issues are seen on the following hardware models:

  • MCX4121A-ACAT firmware revisions 14.25 and 14.32

These issues are experienced with the upgrade to vSphere 8.0 Update 1, which includes the following updated driver:

nmlx5-core 4.23.0.36-8vmw.800.1.0.20513097

This driver from NVIDIA ships with support for both Bluefield SmartNIC and ConnectX Generation 5 network adapters as one package, and rolling back to a previous release of ESXi 8 with the previous driver (nmlx5-core 4.22) immediately resolves all overlay issues

Resolution not yet found, this page will be updated when it is

If anyone would like to contribute to this problem inventory, email me here

Top comments (0)