From the dept of the knowledge arcane: NSX-v with nested ESXi

Nested ESXi is a staple of resource-strapped labs. There is, however, a little something that’s worth keeping in mind when using NSX-v / VXLAN.

Correction: it’s DST UDP port number, not SRC as per original post. Thank you, Dumlu! 😉

I’ve recently came across a funny issue:

  • Take a physical ESXi host, put it into a cluster enabled for NSX-v, configure VXLAN.
  • Create a VLAN-backed dvPortgroup on the same DVS that was configured with VXLAN, and change its security settings to allow Forged Transmits and Promisc Mode.
  • Deploy a nested ESXi to that physical host, and connect it to the VLAN-backed dvPortgroup, which will provide “Uplink” connectivity for its Management vmknic and VTEP.
  • Add the nested ESXi host to a cluster enabled for NSX-v, and configure VXLAN.

Now we have a nested ESXi host with VXLAN running on a physical ESXi host, also with VXLAN.

Why would you do this? For example, if you don’t have enough physical hosts for some topology you’re trying to test. Whatever. Let’s imagine you did do the above. What’s important is that VXLAN-prepared nested ESXi is sitting on a VXLAN-prepared physical host.

Now:

  • Deploy a VM on that nested ESXi host, connected to a Logical Switch (VXLAN-backed dvPortgroup).
  • Observe that the VM can’t communicate with anything outside the physical ESXi host where the nested ESXi is running.
  • Run pktcap-uw on the physical host, and observe that VM’s traffic, correctly VXLAN-encapsulated, with the right Logical Switch’s VNI, is being received from the nested ESXi.
  • Run pktcap-uw on the physical host’s Uplink, and observe that the traffic you just saw isn’t being sent out.
  • Scratch your head.

Or:

  • Configure the nested ESXi host as a member of a Replication Cluster for HW VTEP.
  • Observe that the BFD state between this Replication Node and your HW VTEP is Down.
  • Do the pktcap-uw motions similar to the above; observe nested ESXi sending the BFD packets toward HW VTEP, correctly encapsulated in VXLAN, with VNI=0, but physical host refusing to send these out of its Uplink.
  • Scratch your head.

So, what’s going on here?

The DVS enabled with VXLAN will drop UDP packets with DST port = VXLAN (8472 legacy, or 4789 per RFC), even if they are not addressed to any of VTEPs on this host, or have VLAN ID different from that of VTEP’s dvPortgroup. NSX Manager can be configured to use only one VXLAN UDP port (either 8472 or 4789) for all hosts under it’s control, and that’s the UDP port one that will be affected.

So in our case nested ESXi host was trying to send VXLAN packets, and physical host was dropping them. The same would be true if a regular VM running on the physical ESXi host was sending UDP traffic with the DST port matching VXLAN config of that physical host.

What can be done about it?

Solution 1: use a separate DVS not prepared with VXLAN for the nested ESXi hosts. This obviously will need physical NIC(s) on the physical host for that DVS’s Uplink(s).

Solution 2: If the physical host and the nested ESXi are part of different VC+NSX domains, you could use different VXLAN UDP port setting, e.g., 8472 for the NSX Manager that manages physical host, and 4789 for the other NSX Manager that manages the nested ESXi. There’s an API call to change which UDP port is used on all ESXi hosts managed by a given NSX Manager.

Solution 3: run nested ESXi on physical host(s) not prepared for NSX.

Hope this helps 🙂

Advertisements

About Dmitri Kalintsev

Some dude with a blog and opinions ;) View all posts by Dmitri Kalintsev

10 responses to “From the dept of the knowledge arcane: NSX-v with nested ESXi

  • Luca Morelli

    Hi Dmitri, good catch. I went through this in my lab just a couple of weeks ago and I have to say it became a very good troubleshooting experience 🙂
    As you know while nested environment are very useful for home labs they are not supported in production so it can happen that something goes wrong once in a while.
    Keep up the good work

    • Dmitri Kalintsev

      Hi Luca 🙂 Thanks for the comment. Worth remembering this also applies to regular VMs that happened to use the VXLAN’s UDP port.. 🙂

      • Luca Morelli

        True, that is one of the reasons why VMware recommends to set the VXLAN UDP port to 4789 which is reserved as per rfc. While this can be accomplished today with an API call, as you mentioned, in a future release 4789 will be the default value. I do agree that this should be clearer in the documentation 🙂

  • Tomas Fojta

    Another option: run nested VXLAN transport on VXLAN portgroup instead of VLAN. Watch out for MTU size.

  • Om

    Hi Dmitri,

    I was trying the option 2 (option 1 is not an option for me) and changed the UDP port for VXLAN but it did not work.

    When I do a capture on the nested ESXi host, I can see the packets dropped using pktcap-uw –capture drop –vmk vmk1

    Captured at Drop point, Drop Reason ‘VlanTag Mismatch’. Drop Function ‘Net_AcceptRxList’. TSO not enabled, Checksum not offloaded and not verified, length 60.
    Captured at Drop point, Drop Reason ‘VlanTag Mismatch’. Drop Function ‘Net_AcceptRxList’. TSO not enabled, Checksum not offloaded and not verified, length 60.
    Captured at Drop point, Drop Reason ‘VXLAN Module Drop’. Drop Function ‘OverlayWrapperUplinkInputCB’. TSO not enabled, Checksum not offloaded and verified, length 110.

    (payload removed from the packets above)

    I am not sure why the packets are being dropped by the nested ESXi host itself. Packets are dropped only when I try to ping between the VMs that are on different nested ESXi. VMs ping each other when they sit on one nested ESXi hosts.

    Cheers,

    Om

    • Dmitri Kalintsev

      It’s hard to say what may be the problem without looking at your setup. One thing I can say is that you should not be looking at the packets on a VTEP vmk. VXLAN traffic is never created or terminated on that interface; it’s all handled in a VXLAN IOChain of Uplink vmknic.

  • Om

    Edit: I missed the following log entry in the above paste

    Captured at Drop point, Drop Reason ‘VXLAN Module Drop’. Drop Function ‘OverlayWrapperUplinkOutputCB’. TSO not enabled, Checksum not offloaded and not verified, Vxlan 5001 but not encapsulated, length 98.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: