With Hardware VTEP being implemented in, well, hardware, how things work depends on capabilities of the underlying chipset. This means that when we design solutions using these products, we need to keep these capabilities in mind and configure things accordingly.
In this short post I’ll cover a situation we’ve encountered at one of our customers where things “should” have worked but didn’t, and what was the reason for that.
I’ve been contacted to help solve a mysterious issue that popped up during a customer PoC, where a 6740 HW VTEP connected to an NSX environment was behaving in unexpected ways.
Everything looked great on the control plane – VTEP and MAC tables, both on Controllers and hosts, and on the underlay data plane – everything that should be able to ping each other, could. Traffic capture on ESXi host showed that a VM on a logical switch bridged to a VLAN through HW VTEP were sending traffic correctly encapsulated, and receiving traffic sent from a physical server.
But, two things weren’t happening: no traffic from VM arrived at the physical server, and BFD sessions between the HW VTEP and replication nodes were refusing to come up.
The first reaction was to check whether ESXi hosts’ VTEPs and the VTEP on 6740 were in different IP subnets. Yep, they were.
Here are the relevant bits of a mock 6740’s config:
rbridge-id 10
!
interface Port-channel 20
description L2 to ESXi hosts VTEP segment
switchport
switchport mode trunk
switchport trunk allowed vlan all
!
interface Ve 110
description Default gw for ESXi VTEPs
ip address 192.168.0.1/24
!
interface Ve 112
description HW VTEP
ip address 192.168.1.10/24
vrrp-extended-group 112
virtual-mac 02e0.5200.00xx
virtual-ip 192.168.1.1
short-path-forwarding
!
overlay-gateway NSXv
type hardware-vtep
ip interface Ve 112 vrrp-extended-group 112
attach rbridge-id add 10
attach vlan 3000
activate
!
nsx-controller nsxv
ip address 172.16.0.100 port 6640
activate
ESXi hosts were connected to the 6740 on L2 through the Port-Channel 20
, with their VTEPs sitting in the VLAN 110
. Default gateway (pay attention now, this is important!) for ESXi VXLAN IP stack was set to 192.168.0.1
, which is as you see above is on the interface Ve 110
.
We could ping perfectly fine between either Ve 110
or Ve 112
and ESXi VTEPs, so it wasn’t a problem with connectivity. VXLAN packets leaving ESXi hosts had the correct outer DMAC = MAC of the Ve 110, since it’s the default gateway that is used to reach HW VTEP’s IP (192.168.1.10
on Ve 112
).
On ESXi hosts acting as replication nodes, we could see incoming and outgoing VXLAN packets with BFD PDUs inside to and from HW VTEP. Yet, no worky.
Next thing we tried was to switch over from VRRP-E VTEP to Loopback, by adding interface Loopback 1 and switching from Ve 112
to Loopback 1
in the overlay-gateway
configuration.
rbridge-id 10
!
interface Port-channel 20
switchport
switchport mode trunk
switchport trunk allowed vlan all
!
interface Loopback 1
ip address 10.10.10.10/32
!
interface Ve 110
ip address 192.168.0.1/24
!
interface Ve 112
ip address 192.168.1.10/24
vrrp-extended-group 112
virtual-mac 02e0.5200.00xx
virtual-ip 192.168.1.1
short-path-forwarding
!
overlay-gateway NSXv
type hardware-vtep
ip interface Loopback 1
attach rbridge-id add 10
attach vlan 3000
activate
!
nsx-controller nsxv
ip address 172.16.0.100 port 6640
activate
Well, still no worky. Puzzled, we’ve brought in some engineering help (hi Ram!). After some further collective head-scratching, we’ve noticed that the "In"
packet counters on BFD sessions on the 6740 weren’t increasing, meaning that BFD packets from ESXi hosts weren’t reaching the BFD function. (Remember – BFD packets between HW VTEP to ESXi travel inside VXLAN tunnel with VNI=0).
After seeing this, we decided to try and delete the VRRP-E configuration from Ve 112
(the only VRRP-E interface in this particular configuration). Lo and behold – we were cooking with gas!
With the problem out of the way, it was the time to dig in to understand what caused the problem, and how to prevent it in future.
The first clue was (surprise!) 🙂 in the documentation:
If the VXLAN packet entering a VDX VTEP-enabled device on a Layer 3 interface (such as a routing next hop) is different from the VRRP-E based-VE interface configured for the VTEP, but at an ingress interface where the VTEP VRRP-E VLAN is also configured, and the final destination is an NSX-configured VXLAN tunnel (as identified by the VXLAN tunnel parameters of source IP and destination IP), then the VXLAN traffic is routed to the VTEP interface in the VDX and is a candidate for VXLAN to VLAN bridging, but only if the destination mac of the ingressing VXLAN traffic is the same as that of the virtual mac of the VTEP VRRP-E session.
(A bit hard to parse for me, but once you know what you’re looking for, it kind of makes sense).
So let’s see how this applies to our situation.
Remember, since 6740’s VTEP is in a different IP subnet from ESXi hosts’ VTEPs, said ESXi hosts will use their default gateway as the next hop, which in our case happens to be on the same 6740 as the HW VTEP. So ESXi hosts will set outer DMAC in their VXLAN packets to the MAC of the Ve 110
, which is not the same as the VRRP-E virtual MAC of Ve 112
. And, in line with the quote above, 6740 will drop these VXLAN packets.
If we look carefully, we note that the Port-Channel 20
has switchport trunk allowed vlan all
on it, which of course includes VLAN 112
, that matches Ve 112
that has VRRP-E enabled. I know it’s a different VLAN ID! Looks like it doesn’t matter, though.
In further lab testing we found that if we kept the original configuration (with the Ve 112
used as the VTEP), but deleted VLAN 112
from Port-Channel 20
, we were back to happy days.
Another way would be to follow recommendation in the document linked above, which is:
…creating VE interfaces for each of the ingressing transport VLANs on all the RBridges in the VCS, and then configuring them with the same VRRP-E VRID and virtual-mac address as the VRRP-E VRID and virtual-mac address that was configured for the VTEP.
Translating this for our situation: our “ingressing transport VLAN” is VLAN 110, so all we need to do is change Ve 110
configuration so it looks something like:
!
interface Ve 110
ip address 192.168.0.110/24
vrrp-extended-group 112
virtual-mac 02e0.5200.00xx
virtual-ip 192.168.0.1
The changes are:
- Switch the main interface IP to a different one in the same IP range;
- Add
vrrp-extended-group 112
with the correspondingvitual-mac
, andvirtual-ip
set to theVe 110
’s original IP address.
Make sure that you use the same group number (112) as the one used on the Ve used for VTEP.
This solution probably makes a lot of sense since you’d typically have a pair of 6740s for your HW VTEP, and configure default gw redundancy for ESXi VTEPs with VRRP-E on Ve 110
. Just make sure to use the same VRRP-E group ID.
Q&A
Q: What happens if I have multiple VRRP-E groups, each with different group ID?
A: one of the virtual MACs will be chosen randomly and programmed into ports. You don’t get to choose, AFAIK.
Q: So what do I do?
A: You should be able to safely use the same group ID for all relevant VRRP-E groups. It will use the same MAC on the associated VLANs, which should be fine since VLANs are separate MAC domains.
Q: Does it matter if I use Loopback or Ve for my HW VTEP?
A: No.
Q: Is there an alternative solution?
A: You can add an external L3 hop between your ESXi VTEPs and the VTEP on a 6740. For an example, please see the “Infrastructure” diagram my previous post on the topic. The L3 hop in question is the one labelled “Router (Data)”.
Q: What about removing VLAN IDs of Ve interfaces with VRRP-E from the interface where VXLAN packets coming in?
A: Yep, can do that, too. As long as you remove all VLANs with Ve interfaces that have VRRP-E. Likely scenario is where you have only a single 6740 node, like in our pictured case.
September 4th, 2017 at 11:52 pm
[…] Source: NSX-v with Brocade 6740 Hardware VTEP – additional considerations | Telecom Occasionally […]