Nested ESXi is a staple of resource-strapped labs. There is, however, a little something that’s worth keeping in mind when using NSX-v / VXLAN.
Correction: it’s DST UDP port number, not SRC as per original post. Thank you, Dumlu! 😉
I’ve recently came across a funny issue:
- Take a physical ESXi host, put it into a cluster enabled for NSX-v, configure VXLAN.
- Create a VLAN-backed dvPortgroup on the same DVS that was configured with VXLAN, and change its security settings to allow Forged Transmits and Promisc Mode.
- Deploy a nested ESXi to that physical host, and connect it to the VLAN-backed dvPortgroup, which will provide “Uplink” connectivity for its Management vmknic and VTEP.
- Add the nested ESXi host to a cluster enabled for NSX-v, and configure VXLAN.
Now we have a nested ESXi host with VXLAN running on a physical ESXi host, also with VXLAN.
Why would you do this? For example, if you don’t have enough physical hosts for some topology you’re trying to test. Whatever. Let’s imagine you did do the above. What’s important is that VXLAN-prepared nested ESXi is sitting on a VXLAN-prepared physical host.
- Deploy a VM on that nested ESXi host, connected to a Logical Switch (VXLAN-backed dvPortgroup).
- Observe that the VM can’t communicate with anything outside the physical ESXi host where the nested ESXi is running.
- Run pktcap-uw on the physical host, and observe that VM’s traffic, correctly VXLAN-encapsulated, with the right Logical Switch’s VNI, is being received from the nested ESXi.
- Run pktcap-uw on the physical host’s Uplink, and observe that the traffic you just saw isn’t being sent out.
- Scratch your head.
- Configure the nested ESXi host as a member of a Replication Cluster for HW VTEP.
- Observe that the BFD state between this Replication Node and your HW VTEP is Down.
- Do the pktcap-uw motions similar to the above; observe nested ESXi sending the BFD packets toward HW VTEP, correctly encapsulated in VXLAN, with VNI=0, but physical host refusing to send these out of its Uplink.
- Scratch your head.
So, what’s going on here?
The DVS enabled with VXLAN will drop UDP packets with DST port = VXLAN (8472 legacy, or 4789 per RFC), even if they are not addressed to any of VTEPs on this host, or have VLAN ID different from that of VTEP’s dvPortgroup. NSX Manager can be configured to use only one VXLAN UDP port (either 8472 or 4789) for all hosts under it’s control, and that’s the UDP port one that will be affected.
So in our case nested ESXi host was trying to send VXLAN packets, and physical host was dropping them. The same would be true if a regular VM running on the physical ESXi host was sending UDP traffic with the DST port matching VXLAN config of that physical host.
What can be done about it?
Solution 1: use a separate DVS not prepared with VXLAN for the nested ESXi hosts. This obviously will need physical NIC(s) on the physical host for that DVS’s Uplink(s).
Solution 2: If the physical host and the nested ESXi are part of different VC+NSX domains, you could use different VXLAN UDP port setting, e.g., 8472 for the NSX Manager that manages physical host, and 4789 for the other NSX Manager that manages the nested ESXi. There’s an API call to change which UDP port is used on all ESXi hosts managed by a given NSX Manager.
Solution 3: run nested ESXi on physical host(s) not prepared for NSX.
Hope this helps 🙂