DC network vs. vSwitches and Network Team vs. Virtualisation Team
Judging by blog posts I’ve seen of late, one of the questions in the minds of “cloud builders” today is “who owns and is responsible for the soft switch, the one that resides in hypervisor hosts?”
So far I’ve seen no clear-cut answer. This blog post is my take on it in light of network virtualisation.
For the purpose of this post, we are talking about a Data Centre that mainly hosts infrastructure dedicated to delivery of “typical” IaaS for traditional, “old world” Enterprise applications. As in, a bunch of servers running some sort of hypervisors, some storage (SAN or NAS), and the usual assortment of load balancers, firewalls, IPS/IDS, etc.. It can be an Enterprise’s own private cloud, or a service provider’s facility.
The two types of DC network connectivity
The way I see it, there are two major types of network connectivity present within a DC that hosts virtualised infrastructure:
- “Transport”, which provides connectivity for the infrastructure itself, like vkernel/vMotion, hypervisor storage connectivity, management, and so on; and
- “Service”, which forms part of individual VMs or vApps, connecting them between themselves or to their end users outside the DC.
How these two types are provisioned and managed are very different.
The former is managed as a part of a planned activity and changes rarely; while the later is born to be messed with – to be dynamically created, managed and destroyed by automation tools.
It’s not a problem if a transport service has to cross equipment from different manufacturers – it is quite feasible to build it by hand, and automation helps, but is not critical. The later kind is trickier, of course, and automation here is a must due to the “cloud promise” – elastic, on demand.
So, who’s your daddy, and what does he do?
The characteristics of transport services align them well with the world of a typical network team: they are planned, designed, configured, and then fed and groomed till their time comes to an end.
But now, here come the services of the second kind, and the trouble begins. Because vSwitch is a key part of majority of these types of service, there are three major possible scenarios here:
- Control over vSwitches is handed over to the Network Team. They now have to (a) automate service management across their traditional domains (switches/routers); plus (b) accommodate new, additional domain (vSwitches). In a word – pain.
- Virtualisation team looks after vSwitches. Network Team are not in a very good position either: they still have to automate service management across their traditional domain to keep up with the “cloud pace”, but additionally they now have to prevent virtualisation guys from shooting themselves in the foot. Double the pain points.
- Network is virtualised. The “traditional” DC network provides “Transport” connectivity for an overlay “Service” network, with end-points on vSwitches which sit in hypervisor hosts and in network edge devices. Overlay network is controlled by the virtualisation guys and there’s nothing new to do for the Network Team.
The Third Scenario
So, we’re back to “digging tunnels”. VXLAN is one example; if I understand correctly what Nicira is doing, their Network Hypervisor is another. (This blog post is not a comparison between these two, however probably worth noting that as far as I know, Nicira’s solution has upper hand here with not needing IP multicast in the transport network and apparently having an actual implementation in a physical switch, made by one of their partners, coming very soon).
Why do I think this third scenario is good?
One, as I noted above, it gives Network Team peace of mind and takes off pressure to automate their domain, which now may not be really necessary. Potential major savings in time and money.
Two, it gives virtualisation team the flexibility and speed they need – in Nicira’s example, if I understand it correctly, OpenFlow protocol can be used to program constellation of their Open vSwitches, to establish individual services and potentially do other kinds of OpenFlow magic.
Imagine that you have Open vSwitch in your hypervisor hosts and your ToR switches, where you connect your appliances (like IPS/IDS, VPN gateways, etc) or where your end-customers connect their private WAN links. The Network Team has provisioned you a VLAN or a VRF that connects all your hosts and edge devices together, in a redundant manner, so you are assured that the underlying DC network will take care of protecting this connectivity and deal with bandwidth requirements. Your Open vSwitches have established connectivity between themselves, and as far as they are concerned, they are all directly connected to each other.
Now, to provision an IaaS service, your automation/orchestration system makes API calls to instantiate necessary network links in your OpenFlow controller, which pushes configuration to the relevant Open vSwitches. Done. No need to configure anything on the physical DC network – no services to create, no VLANs to consume. Need to extend to another DC? No problem, just organise the connectivity.
There is no free cheeze
Of course. Tunnels mean less transparency for the underlying infrastructure. But, why do you need it there? You can always punt a service to an edge switch, run it through an IPS/IDS/DPI/whatever and pop it back in.
There are, of course, other considerations regarding where does network virtualisation add value and where it takes away. I’m not saying the things are black and white. I just believe that the network abstraction, with two layers (transport and service) is ultimately the way to go.
So, who’s vSwitch is it, anyway?
In the light of argument outlined above the answer in my mind is clear – if the network is virtualised, then vSwitch is to be controlled by the virtualisation team.
Comments are, as usual, very welcome!