NSX for vSphere: Understanding Transport Zone scoping

As part of NSX preparation for logical switching and routing, it is necessary to define at least one Transport Zone (from here on – “TZ”).

It is obvious from the UI that TZ configuration includes default VXLAN Control Plane mode and a list of ESXi clusters; but what does it actually do?

Let’s find out.

TZ and Logical Switches’ Control Plane mode

This one’s easy: VXLAN Control Plane Mode selected when a TZ is created specifies the default Control Plane Mode for Logical Switches created in that TZ. In other words, when you create an LS via UI or API, TZ CP mode will be selected for your new LS by default. However, you can override it per individual LS, and of course you can change it later.

It is perfectly fine to have Logical Switches in different CP modes within the same TZ; however it may make recovery from some system failures, such as total loss of Controller Cluster (for eg, due to datastore failure or a human error), more time-consuming.

Needless to say, Hybrid and Multicast CP modes require that you configure a pool of multicast IP addresses for NSX to draw from.

TZ, Logical Switches, and DVS

This one’s not very obvious, but makes sense if you think about it. Logical Switches are not explicitly “created” on ESXi hosts. When an LS is created, a TZ is specified for it to live in. NSX Manager looks up that TZ’s config to see which clusters are included in it, and builds a list of DVS prepared for VXLAN that correspond to that TZ. Then, NSX Manager creates a “special” dvPortgroup on this or these DVS, and tells Controller Cluster about that new LS. That’s it as far as NSX Manager is concerned. DVS component will take care of informing hosts of the new dvPg.

When a VM is connected to such special dvPg (or DLR is configured with a VXLAN LIF), ESXi host would request further information from the Controller Cluster, which will allow VXLAN kernel module on that host to do its job.

This has an interesting side effect: if you didn’t add all clusters of a given DVS to the TZ, those clusters you haven’t added will still have access to that Logical Switch. Let’s have a look at the following diagram:

Diagram

We have three clusters here (Comp A, Comp B, and Mgmt / Edge), and two DVS (Compute_DVS and Mgmt_Edge_DVS). If we were to create a TZ that included say clusters Comp B and Mgmt / Edge, but didn’t include Comp A, any LS created against that TZ will still be “available” to VMs running on cluster Comp A.

Why? Because LS is really a DVS Portgroup, and the cluster Comp A is a member of the DVS that’s part of the TZ.

What will happen if we connect a VM running on cluster Comp A to a dvPg that corresponds to such an LS? VXLAN will work just fine – VMs on that LS will be able to communicate across all three clusters; however the trouble will arrive when we connect such an LS to a DLR.

TZ and DLR

Unlike LS, DLR instances are created by NSX Manager on each ESXi host explicitly. This procedure has no relationship or dependency on DVS, and it does follow the TZ scope strictly.

This means that in out hypothetical case, if we were to create a DLR and connect to it that LS we’ve created earlier, DLR instance would get created on hosts in clusters Comp B and Mgmt / Edge, but not on hosts in cluster Comp A:

TZ is misaligned with the DVS boundary

This would cause an “interesting” situation, where Comp A VMs will not be able to reach their DLR’s LIF, because DLR, along with its LIFs, simply wasn’t created on these hosts.

So in the diagram above, VMs web1, web2 and LB can talk to each other just fine; but VMs app1 and db1 will not be able to talk to anything. Additionally, VM web1 won’t be able to talk to anything other than what is connected to the same VXLAN as itself (5001). This will surely appear confusing.

Conclusion

When creating your Transport Zones, make sure to include all clusters in each DVS, ie, align TZ to the DVS boundary. Also, forewarned is forearmed. 😉

About Dmitri Kalintsev

Some dude with a blog and opinions ;) View all posts by Dmitri Kalintsev

24 responses to “NSX for vSphere: Understanding Transport Zone scoping

  • michael

    Very clear, makes sense…thanks for the great explanation!

  • Saenz,Michael

    Great post! Thank you!

  • NSX vCloud Retrofit: Logical Network Preparation and Transport Zone Setup - VIRTUALIZATION IS LIFE!

    […] has released a series of excellent blog posts relating to NSX…the latest goes through Transport Zones in super deep dive detail. If you are not following Dimitri and you are interested in […]

  • NSX Link-O-Rama | vcdx133.com

    […] NSX for vSphere: Understanding Transport Zone scoping by Telecom Occasionally – NEW! […]

  • Mohamed

    Thanks for the post.
    So just to confirm, if the DLR is created for another LS on “Comp A”, LIFs related to web will be created or not?

    • Dmitri Kalintsev

      LIF creation will respect the TZ configuration. If TZ of Web LS doesn’t include “Comp A”, DLR LIFs for that LS will not be created.

      Disclaimer: this post is rather dated, and I haven’t touched NSX for good three years now, so please keep this in mind in case things have changed.

  • manchu

    please stop using acronynyms 😦 hard to follow when reading

  • tronar

    There’s something I don’t follow, if cluster A was not on any TZ, then why would it have a VTEP ? And vL2 traffic should go through VTEPs, right ? So how would that work ? (vL2 to mean L2 on LS, i.e. virtual L2)

    • Dmitri Kalintsev

      Any ESXi host in a cluster that has been prepared for NSX will have at least one VTEP. TZ configuration is a separate operation, which is why the situation with misaligned TZ scope is possible.

      Also, traffic does not technically “go” through VTEP; you can easily confirm this by running packet capture on your VTEP vmknic. Instead, VXLAN packets are “constructed” and “deconstructed” in uplink IOchain, while “borrowing” IP and MAC address of the VTEP vmknic.

      • tronar

        Hmm, VTEPs are tunnel endpoints, if you follow the metaphor, going through the tunnel requires crossing both tunnel endpoints, but that is lyrics. The point is that if you don’t have a VTEP, traffic can no go even if you are on the same portgroup of the same VDS. Even if you had a VTEP, that would not be assigned to the LS in the controllers ? I guess it’s time for labbing this out.

      • Dmitri Kalintsev

        You are typically expected to prepare all clusters that are members of the same DVS; in this example case DVS ‘Comupte_DVS’ that spans clusters ‘Comp A’ and ‘Comp B’. Once you do this, all ESXi hosts in all prepared clusters will have VTEPs, irrespective of their TZ configuration.

  • tronar

    I’ve done the lab, and it does not work as you say (i.e. that VXLAN traffic works even in clusters that are not part of the TZ). Which is what I would have expected but not what is said everywhere. Even L2 traffic should go over VXLAN, and that requires your MAC to be in the VNI MAC table, which is controlled by the TZ, and your VTEP to go into the VTEP table. None of that happens if your cluster is not in the TZ.

    • Dmitri Kalintsev

      This post talks about scoping of the *DLR*, not VTEPs. In summary it says that VTEP/VXLAN is scoped at the DVS boundaries and does not respect TZ, while DLR strictly follows TZ boundaries.

      What you say is exactly the expected behaviour, for L2.

      • tronar

        What I say is that VXLAN does respect TZ too!

      • Dmitri Kalintsev

        This post is over 4 years old, so it’s entirely possible that the behaviour has changed since then. Which version of NSX did you use for your lab?

      • tronar

        6.4.1, fairly recent version. But the underlaying reason is conceptual. The non TZ vteps are not listed as participating in the LS (vxlan vni). May be it did work before though (by mistake, I would say, given that one use case for TZs is as security measure to isolate zones)

      • Dmitri Kalintsev

        Ok, I spun up a HOL lab HOL-1903-01-NET (NSX 6.4.1), created a new TZ across RegionA01_COMP01 and RegionA01_MGMT01, and then created a new LS in this new TZ. Note that cluster RegionA01_COMP02 is not included. I then created a new LS in my new TZ, and attached two VMs to it – web01a_corp.local (that sits in cluster RegionA01_MGMT01) and web04a_corp.local (that sits in cluster RegionA01_COMP02 – note this cluster is not in my new TZ). I then change IP on the web04a to 172.16.10.12 so it’s on the same subnet as web01a. Make sure that web04a network interface is “connected” (by default it isn’t), and try to ping. It works, as expected, despite web04a sitting on a host that isn’t included in the TZ.

      • tronar

        Amazing, thank you very much for your time. I did reproduce it in the HOL, the only thing that I notice that is different is ESXi version, HOL uses 6.5 and I’m running 6.7 here, but now that I have a working alternative, I’ll try to nail it. Thanks again.

      • Dmitri Kalintsev

        You’re welcome. If you’re into understanding what happens under the covers, I would recommend checking out the NSX Troubleshooting guide that has a good chunk of my work in it, too: https://docs.vmware.com/en/VMware-NSX-for-vSphere/6.4/nsx_64_troubleshooting.pdf

  • tronar

    Ok, now I know what happened: originally I had not prepared one of the clusters, and then, when I prepared the second I failed to configure VXLAN, no VTEPs where created and I missed the clues. No VTEPs, no connection. As soon as I fixed that, the VTEP and MAC tables populated with extra transport zone host data, and we know the rest. You have been very supportive. Thanks.

  • NSX – Mon Wiki

    […] Je vous invite à lire cet article pour en savoir plus sur la zone de transport : https://telecomoccasionally.wordpress.com/2014/12/27/nsx-for-vsphere-understanding-transport-zone-sc… […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: