When troubleshooting, it’s important to know how the system is supposed to work, so that one can tell whether the currently observed behaviour is normal or not.
This post covers VXLAN ARP suppression in NSX-v.
NSX-v Logical Switches, implemented as VXLAN, include ARP suppression functionality (which can be turned off per logical switch as of NSX 6.1). This functionality is there to minimise the amount of ARP traffic flooding within individual VXLAN segments, ie., between VMs connected to the same Logical Switch.
Disclaimer: this post uses a few unsupported/undocumented commands, which means “if you use them and break something, it’s 100% your responsibility“. The primary purpose of this post is reader education, not call to action.
VXLAN ARP suppression is a function of Switch Security module (SwSec), which is a dvFilter attached to VMs’ vNICs.
We can see that dvFilter with summarize-dvfilter
command:
[skip] ... world 1442427 vmm0:web-sv-02a vcUuid:'50 26 b7 4d c5 6c 1e d9-47 c0 09 25 95 80 2f ad' port 50331657 web-sv-02a.eth0 vNic slot 2 name: nic-1442427-eth0-vmware-sfw.2 agentName: vmware-sfw state: IOChain Attached vmState: Detached failurePolicy: failClosed slowPathID: none filter source: Dynamic Filter Creation vNic slot 1 name: nic-1442427-eth0-dvfilter-generic-vmware-swsec.1 agentName: dvfilter-generic-vmware-swsec <= Here it is state: IOChain Attached vmState: Detached failurePolicy: failClosed slowPathID: none filter source: Alternate Opaque Channel
Note: DLR is connected to a special VDS port called “vdrPort”, which doesn’t have SwSec dvFilter; therefore DLR does not benefit from VXLAN ARP suppression. I will cover details on DLR ARP resolution process in a future post.
NSX Control Plane logs are written into
/var/log/netcpa.log
on ESXi host. Logging level by default is “info”, and needs to be changed to “verbose” to observe VXLAN control plane operations described here. To do that (at your own risk!):1)
chmod +wt /etc/vmware/netcpa/netcpa.xml
2) edit/etc/vmware/netcpa/netcpa.xml
and change “info” in <level></level> section of <log> to “verbose”
3)/etc/init.d/netcpad restart
Let’s take a look at the following example:
Host H1 with VM1 on Logical switch associated with the VNI 5000
Host H2 with VM2 on the same VNI 5000
All tables/caches are empty: H1 and H2 ARP and MAC cache; VM1 and VM2 ARP tables:
H1:
~ # net-vdl2 -M arp -s Compute_VDS -n 5000 ARP entry count: 0 ~ # net-vdl2 -M mac -s Compute_VDS -n 5000 MAC entry count: 0
H2:
~ # net-vdl2 -M arp -s Compute_VDS -n 5000 ARP entry count: 0 ~ # net-vdl2 -M mac -s Compute_VDS -n 5000 MAC entry count: 0
Now, VM1 issues two ping packets to VM2.
In the netcpa log file we observe:
On H1:
ARP request captured by SwSec and forwarded as a Query to Controller, asking it if it knows a MAC corresponding to the IP in the request
--> ARP (v4) Query: len = 16 --> SwitchID:0, VNI:5000 --> Num of entries: 1 --> #0 VM IP:172.16.10.12
Since this is an ARP request, SwSec also generates a VM IP Update
for VM1, which is sent to the Controller, telling it about VM1’s IP and MAC
--> VM IP (v4) Update: len = 24 --> SwitchID:0, VNI:5000 --> Num of removed entries: 0 --> Num of added entries: 1 --> #0 VM IP:172.16.10.11 VM MAC:00:50:56:a6:7a:a2
In our case, Controller didn’t have any cached ARP info for VM2, so it responds with “I don’t know” to SwSec’s ARP request
--> ARP (v4) Update: len = 32 --> SwitchID:0, VNI:5000 --> Num of entries: 1 --> #0 VM IP:172.16.10.12 VM MAC:ff:ff:ff:ff:ff:ff VTEP IP:255.255.255.255 VTEP MAC:ff:ff:ff:ff:ff:ff
Meanwhile on H2:
Since VM2 (running Linux) will send an ARP to VM1 to get an authoritative confirmation of VM1’s MAC address, SwSec on H2 will catch it and generate a VM IP Update
for VM2
--> VM IP (v4) Update: len = 24 --> SwitchID:0, VNI:5000 --> Num of removed entries: 0 --> Num of added entries: 1 --> #0 VM IP:172.16.10.12 VM MAC:00:50:56:a6:a1:e3
As the result, tables on H1 and H2 are updated as follows:
H1:
~ # net-vdl2 -M mac -s Compute_VDS -n 5000 MAC entry count: 1 Inner MAC: 00:50:56:a6:a1:e3 <== VM2 Outer MAC: 00:50:56:62:b2:26 Outer IP: 192.168.250.53 Flags: 7 ~ # net-vdl2 -M arp -s Compute_VDS -n 5000 ARP entry count: 1 IP: 172.16.10.12 MAC: ff:ff:ff:ff:ff:ff <== “I don’t know” that came from Controller Flags: F
H2:
~ # net-vdl2 -M mac -s Compute_VDS -n 5000 MAC entry count: 1 Inner MAC: 00:50:56:a6:7a:a2 <== VM1 Outer MAC: 00:50:56:65:15:14 Outer IP: 192.168.250.52 Flags: 7 ~ # net-vdl2 -M arp -s Compute_VDS -n 5000 ARP entry count: 0 <== Nothing learned, since H2 didn't query the Controller, as per logs above
And on the Controller we see the ARP cache populated by the two VM IP Updates (which will time out in 180 seconds):
nvp-controller # show control-cluster logical-switches arp-table 5000 VNI IP MAC Connection-ID 5000 172.16.10.12 00:50:56:a6:a1:e3 6 5000 172.16.10.11 00:50:56:a6:7a:a2 7
Now, the above prompts a few questions.
VXLAN SwSec module generates VM IP Update
in two cases:
- When it receives an ARP request from a VM; and
- When it is about to send a DHCP ACK from DHCP server to a VM
According to the logs above, H2 has sent a VM IP Update
to the controller, but there’s no corresponding ARP Query
and ARP Update
messages. What’s going on?
Let’s have a look at what is coming out of the VM2 during our ping operation:
16:00:08.484523 00:50:56:a6:a1:e3 > 00:50:56:a6:7a:a2, ethertype ARP (0x0806), length 60: Reply 172.16.10.12 is-at 00:50:56:a6:a1:e3, length 46 <== replied to ARP from VM1 16:00:08.485879 00:50:56:a6:a1:e3 > 00:50:56:a6:7a:a2, ethertype IPv4 (0x0800), length 98: 172.16.10.12 > 172.16.10.11: ICMP echo reply, id 36462, seq 1, length 64 <== Replied to ping #1 from VM1 16:00:09.465441 00:50:56:a6:a1:e3 > 00:50:56:a6:7a:a2, ethertype IPv4 (0x0800), length 98: 172.16.10.12 > 172.16.10.11: ICMP echo reply, id 36462, seq 2, length 64 <== Replied to ping #2 from VM1 16:00:13.493710 00:50:56:a6:a1:e3 > 00:50:56:a6:7a:a2, ethertype ARP (0x0806), length 60: Request who-has 172.16.10.11 tell 172.16.10.12, length 46 <== Sent a unicast ARP for VM2's MAC
So the difference with what’s going on on H1 is that ARP request that is coming from VM2 has a unicast DST MAC.
Due to this subtle difference, SwSec module on H2 does the following:
- It generates a
VM IP Update
(since it has seen an ARP coming from a VM); but - Does not generate an
ARP Query
to the controller (and thus doesn’t see a correspondingARP Update
).
Update Jan 2015: In NSX-v 6.1, there is an option to disable ARP suppression per Logical Switch. At the time of writing, this is implemented by setting Controller to ignore the VM IP Update
messages sent to it by hosts for that LS.
What you will see is hosts will continue sending VM IP Update
and ARP Query
messages to the Controller, but Controller’s ARP table will stay empty, causing it to respond to any ARP Query
with ff:ff:ff:ff:ff:ff.
Somewhat confusingly, esxcli network vswitch dvs vmware vxlan network list --vds-name [DVS_Name]
command output will still show “ARP proxy” against the Control Plane of VNI of the Logical Switch, even if its ARP suppression is disabled.
October 30th, 2014 at 7:01 am
[…] just published a very detailed post over on his blog about how NSX performs ARP suppression. It gives a good insight into what the controller does in Unicast mode and some great commands to […]
October 31st, 2014 at 7:28 pm
[…] another great post by Dmitri Kalintsev on his Telecom Occasionally bLOG I found a little nugget of incredibly useful information. Dmitri’s article illustrates in […]
October 31st, 2014 at 7:39 pm
[…] another great post by Dmitri Kalintsev on his Telecom Occasionally bLOG I found a little nugget of incredibly useful information. Dmitri’s article illustrates in […]
November 8th, 2014 at 1:02 am
[…] you’re interested in more details on how NSX handles ARP suppression, Dmitri Kalintsev has a post just for you. Dmitri has some other great NSX-related content as […]
December 2nd, 2014 at 11:56 pm
[…] Click Enable IP Discovery to enable ARP suppression; […]
December 19th, 2014 at 5:39 am
[…] NSX-v under the hood: VXLAN ARP suppression […]
January 12th, 2015 at 9:07 am
[…] In Hybrid and Unicast modes NSX can also provide reduction of flooding due to VMs sending ARP requests for other VMs on the same VXLAN. For more details, please see my other blog post – NSX-v under the hood: VXLAN ARP suppression […]
January 12th, 2015 at 7:53 pm
Hi Dimitri,
The last packet(ARP request from VM2 to VM1) is normal. This ARP request is sent after the ICMP replies. i.e after the communication with VM1 is over.
This last ARP request is normal and I have seen this between 2 host running Linux/windows in non virtualized environment, where the last packet is ARP request after ping reply.
http://serverfault.com/questions/81651/strange-why-does-linux-respond-to-ping-with-arp-request-after-last-ping-reply
I would like to know, where is the ARP suppression happening.
According to my understanding, ARP supression happens from the controller side, for example: once we create the logical switch and add VM1 and VM2 of corresponding Host1 and Host2, the host pushes the port information of the switch(where it is connected to), to the controller and the controller updates this info and sends the same to all other hosts. In this way, ESXi each host will build the MAC table.
Once the logical switch is created, the controller knows the MAC address of the VM’s in its ARP table and MAC address of the hosts in the MAC/CAM table.
when we ping from VM1 to VM2, VM1 sends the ARP request to get the MAC of VM2, since the controller maintains this information, it responds with the MAC address of VM2.
Please correct me if I am wrong.
January 12th, 2015 at 8:41 pm
Hi Harish,
ARP suppression is performed by DVS Switch Security dvFilter.
Remember that ARP table is a map of VM’s IP addresses to their MAC addresses. Controller can’t know the IP address part until VM sends an ARP request, or receives a DHCP ACK, SwSec module intercepts this communication, and reports it to the Controller.
Overall, SwSec dvFilter is responsible for:
– Intercepting VM’s ARP request
– Reporting VM’s IP to the Controller
– Checking for answer to VM’s ARP request in local ARP cache
– If nothing is found, asking Controller if it has an ARP reply
– Storing Controller’s answer in a local ARP cache
Also, the only VXLAN-related data that’s proactively pushed from Controllers to hosts is a list of VTEPs for each VNI. All other info has to be requested by host.
Hope this helped.
April 27th, 2016 at 8:24 am
Hi Dmitri,
The SwSec dvFilter inspecting ARP requests and DHCP ACKs is dependent on the configuration of those features I assume ? In SpoofGuard or on each logical switch it is possible to configure the IP Discovery type, this is the same feature, correct ?
Kyle
April 27th, 2016 at 5:19 pm
Hi Kyle,
Thanks for the comment. IP Discovery and SwSec’s IP learning functions are separate, independent, and serve different purposes. Yes, I agree it’s a bit confusing.
SwSec learns IP to MAC mapping so that VXLAN ARP suppression function can work.
IP Discovery function, which can *also* use ARP requests and/or DHCP replies, serves NSX DFW’s need to map vCenter objects to IP addresses. Remember, L3/L4 DFW rules operate on IP addresses in the end.
These two functions are done in different dvFilters – SwSec’s one in, well, SwSec dvFilter, and IP Discovery in SFW (DFW) dvFilter, in particular as part of Spoofguard.
Hope this makes sense.
January 7th, 2017 at 2:54 am
Hello. I tested ARP suppression for 6.2.4 version. i use python script with scapy module to generate Unicast ARP request. As soon as script generates the arp request packet, controller will get new arp record. The record corresponds to source part of the arp packet.
January 7th, 2017 at 7:32 am
Hi Vladimir,
Yes, this is correct. What you’re observing is ARP request triggering SwSec MAC to IP learning process, causing it to generate VM IP (v4) Update. You can see this described in this post, where I walk through a ping process. The fact that you’re sending a Unicast ARP makes no difference. 🙂
HTH 🙂
January 7th, 2017 at 6:07 pm
Thank you, i got it.
This is a really great post!
Are you planning to explain DLR ARP resolution process?
January 7th, 2017 at 6:11 pm
Oh, the whole DLR thing.. 🙂 This is a big topic, and I’ve been toing and froing about writing it up for a while. There’s a lot of info out there as it is, so I’m not entirely sure how useful this would be, hence the hesitation. 🙂
February 13th, 2018 at 6:52 am
[…] Click Enable IP Discovery to enable ARP suppression; […]