Nested ESXi is a staple of resource-strapped labs. There is, however, a little something that’s worth keeping in mind when using NSX-v / VXLAN.
Correction: it’s DST UDP port number, not SRC as per original post. Thank you, Dumlu! π
I’ve recently came across a funny issue:
- Take a physical ESXi host, put it into a cluster enabled for NSX-v, configure VXLAN.
- Create a VLAN-backed dvPortgroup on the same DVS that was configured with VXLAN, and change its security settings to allow Forged Transmits and Promisc Mode.
- Deploy a nested ESXi to that physical host, and connect it to the VLAN-backed dvPortgroup, which will provide “Uplink” connectivity for its Management vmknic and VTEP.
- Add the nested ESXi host to a cluster enabled for NSX-v, and configure VXLAN.
Now we have a nested ESXi host with VXLAN running on a physical ESXi host, also with VXLAN.
Why would you do this? For example, if you don’t have enough physical hosts for some topology you’re trying to test. Whatever. Let’s imagine you did do the above. What’s important is that VXLAN-prepared nested ESXi is sitting on a VXLAN-prepared physical host.
Now:
- Deploy a VM on that nested ESXi host, connected to a Logical Switch (VXLAN-backed dvPortgroup).
- Observe that the VM can’t communicate with anything outside the physical ESXi host where the nested ESXi is running.
- Run pktcap-uw on the physical host, and observe that VM’s traffic, correctly VXLAN-encapsulated, with the right Logical Switch’s VNI, is being received from the nested ESXi.
- Run pktcap-uw on the physical host’s Uplink, and observe that the traffic you just saw isn’t being sent out.
- Scratch your head.
Or:
- Configure the nested ESXi host as a member of a Replication Cluster for HW VTEP.
- Observe that the BFD state between this Replication Node and your HW VTEP is Down.
- Do the pktcap-uw motions similar to the above; observe nested ESXi sending the BFD packets toward HW VTEP, correctly encapsulated in VXLAN, with VNI=0, but physical host refusing to send these out of its Uplink.
- Scratch your head.
So, what’s going on here?
The DVS enabled with VXLAN will drop UDP packets with DST port = VXLAN (8472 legacy, or 4789 per RFC), even if they are not addressed to any of VTEPs on this host, or have VLAN ID different from that of VTEP’s dvPortgroup. NSX Manager can be configured to use only one VXLAN UDP port (either 8472 or 4789) for all hosts under it’s control, and that’s the UDP port one that will be affected.
So in our case nested ESXi host was trying to send VXLAN packets, and physical host was dropping them. The same would be true if a regular VM running on the physical ESXi host was sending UDP traffic with the DST port matching VXLAN config of that physical host.
What can be done about it?
Solution 1: use a separate DVS not prepared with VXLAN for the nested ESXi hosts. This obviously will need physical NIC(s) on the physical host for that DVS’s Uplink(s).
Solution 2: If the physical host and the nested ESXi are part of different VC+NSX domains, you could use different VXLAN UDP port setting, e.g., 8472 for the NSX Manager that manages physical host, and 4789 for the other NSX Manager that manages the nested ESXi. There’s an API call to change which UDP port is used on all ESXi hosts managed by a given NSX Manager.
Solution 3: run nested ESXi on physical host(s) not prepared for NSX.
Hope this helps π
March 10th, 2016 at 5:40 pm
Hi Dmitri, good catch. I went through this in my lab just a couple of weeks ago and I have to say it became a very good troubleshooting experience π
As you know while nested environment are very useful for home labs they are not supported in production so it can happen that something goes wrong once in a while.
Keep up the good work
March 10th, 2016 at 6:32 pm
Hi Luca π Thanks for the comment. Worth remembering this also applies to regular VMs that happened to use the VXLAN’s UDP port.. π
March 12th, 2016 at 6:15 am
True, that is one of the reasons why VMware recommends to set the VXLAN UDP port to 4789 which is reserved as per rfc. While this can be accomplished today with an API call, as you mentioned, in a future release 4789 will be the default value. I do agree that this should be clearer in the documentation π
April 10th, 2016 at 5:54 am
Another option: run nested VXLAN transport on VXLAN portgroup instead of VLAN. Watch out for MTU size.
April 10th, 2016 at 3:51 pm
Hi Tomas – thanks for the comment!
I didn’t look much further into which particular IOChain is dropping these packets, so in my mind it’s a 50% chance whether this would work. π
Did you by chance try it? If yes, could you please let me know if it works, and with which ESXi and NSX versions? Thank you! π
April 11th, 2016 at 1:35 am
Works fine.
April 11th, 2016 at 8:44 am
Sweet, thanks!
February 26th, 2020 at 11:53 pm
Hi,
Can you explain a little better how to implement that solution (run nested VXLAN transport on VXLAN portgroup instead of VLAN)?
This is my current layout:
Physical ESXi
DS (mtu 1600)
DPG-LAN (vlan 1) – uplink 1 – physical nic
DPG-LAN-NESTED (vlan trunk) – uplink 1 – physical nic
VXLAN configured on vlan 4
VM-TST0 – vnic with with virtual network on …virtualwire-1-sid-5000 of distributed switch DS
Nested ESXi1 (uses DPG-LAN-NESTED as virtual network) / Nested ESXi2 (uses DPG-LAN-NESTED as virtual network)
DS-TST
DPG-TST-LAN (vlan 1) – uplink 1 – virtual nic
VXLAN configured on vlan 4
VM-TST1 – vnic with with virtual network on …virtualwire-1-sid-5000 of distributed switch DS-TST
VM-TST2 – vnic with with virtual network on …virtualwire-1-sid-5000 of distributed switch DS-TST
March 3rd, 2020 at 2:57 pm
I haven’t touched NSX for a good part of 5 years now, so wouldn’t risk giving you an exact answer. π One thing you could try is to spin up one of the NSX labs on https://labs.hol.vmware.com and have a look at the DVS uplink configuration there. These labs run nested (or at least they used to), and the DVS configuration they use could give you necessary clues. IIRC, nested vSphere’s NSX is configured to use VLAN 0 on “Configuring VXLAN networking” step, with vmknic teaming set to “Fail Over”.
HTH π
July 10th, 2017 at 4:51 pm
Hi Dmitri,
I was trying the option 2 (option 1 is not an option for me) and changed the UDP port for VXLAN but it did not work.
When I do a capture on the nested ESXi host, I can see the packets dropped using pktcap-uw –capture drop –vmk vmk1
Captured at Drop point, Drop Reason ‘VlanTag Mismatch’. Drop Function ‘Net_AcceptRxList’. TSO not enabled, Checksum not offloaded and not verified, length 60.
Captured at Drop point, Drop Reason ‘VlanTag Mismatch’. Drop Function ‘Net_AcceptRxList’. TSO not enabled, Checksum not offloaded and not verified, length 60.
Captured at Drop point, Drop Reason ‘VXLAN Module Drop’. Drop Function ‘OverlayWrapperUplinkInputCB’. TSO not enabled, Checksum not offloaded and verified, length 110.
(payload removed from the packets above)
I am not sure why the packets are being dropped by the nested ESXi host itself. Packets are dropped only when I try to ping between the VMs that are on different nested ESXi. VMs ping each other when they sit on one nested ESXi hosts.
Cheers,
Om
July 11th, 2017 at 3:18 pm
It’s hard to say what may be the problem without looking at your setup. One thing I can say is that you should not be looking at the packets on a VTEP vmk. VXLAN traffic is never created or terminated on that interface; it’s all handled in a VXLAN IOChain of Uplink vmknic.
July 11th, 2017 at 1:38 am
Edit: I missed the following log entry in the above paste
Captured at Drop point, Drop Reason ‘VXLAN Module Drop’. Drop Function ‘OverlayWrapperUplinkOutputCB’. TSO not enabled, Checksum not offloaded and not verified, Vxlan 5001 but not encapsulated, length 98.
September 13th, 2017 at 3:04 am
Great post solved my issue. only thing I would have liked to have added is the syntax needed to run pktcap-uw
August 28th, 2018 at 3:27 am
[…] NSX issue on dvs in nested ESXiΒ (telecomOccasionally) […]
December 31st, 2018 at 4:58 am
[…] NSX issue on dvs in nested ESXiΒ (telecomOccasionally) […]
December 31st, 2018 at 5:24 am
[…] NSX issue on dvs in nested ESXiΒ (telecomOccasionally) […]