There are plenty arguments going on around how making network equipment “dumb” and programming it from a centralised location will save a ton. Well, saving a ton it may be, but I would like to focus on what might be lost or compromised by doing so.
As an aside: looking at the evolution of the OF protocol, the amount of “stuff” that switches will need to be able to do to support it is increasing quite rapidly. Wonder how long will it take for the complexity to be back in the devices in force, just under a different banner?
From my reading of the OF spec v1.2 (BTW, why is it still unavailable on the openflow.org website for download?), there is a definite reliance on the functionality of the switch itself (or “vendor extensions”) to determine state of ports and links. This means that there is a dependency on the switches playing along nicely with each other for the purpose of link state detection. LLDP, BFD, L-OAM? Mmm, “dumb”, indeed.
Yes, it is possible to implement checking mechanism that would go something like “controller -> switch A -> port/link to switch B -> controller”, but feasibility of this will depend on how quickly you want to detect failures, translating into how often you will then need to check, and how many links you need to monitor.
End to end connectivity fault management
So, now that we know that link/port fault detection is likely a responsibility of a switch, we realise that we better have an ability to verify whether our forwarding paths are actually forwarding, as expected. However, presently I am not aware of any provisions for end to end, i.e., across multiple switches and links, connectivity monitoring for OF-based forwarding functions.
As with the resiliency above, it is entirely possible to implement a sort of connectivity fault management on the controller, but the feasibility will again depend on the number of flow paths to be checked.
As with the other two, it is theoretically possible to implement transmission performance monitoring on the controller. It could generate probe messages and send them to switches for insertion into forwarding paths, then pick them up at the other end and/or along the way. However, it also presents a couple of challenges – the same old scalability, as you will burn your controller’s cycles and control path bandwidth, and one new one – accuracy, as the results of your measurements will include delays introduced by your control paths.
Depending on the size of your network and the number of forwarding paths for which you would like to monitor transmission performance, potentially times the number of classes of service, it may not be a problem. But to me it definitely does look like a rake laying hidden in the grass.
…which brings us to the
Intelligent traffic flow management
One of the selling points of the centralised forwarding coordination is the ability to react to changing network conditions in real-time, and redirect traffic flows accordingly. While I agree that a fair bit can be gleaned from the flow stats (i.e., how much traffic is being forwarded), I think that a much bigger and juicier slice of a pie is being left on the plate.
The utilisation of a link or a flow says almost nothing about the experience of the traffic within this flow. Think of it as if you were monitoring a highway toll booth, and say you are seeing only few cars going through, which zip away happily into the distance; however what you do not see is a massive accident a few kilometers away that has blocked all lanes except one, and a build-up of cars trying to get past it. The next toll booth past the zone of the accident will also see only a few cars, but it will also know nothing of the fact that these few cars spent an hour to travel the distance normally covered in ten minutes.
The scenario above is not very likely when your links are distinct pieces of fibre, but it is very much possible if they are say WAN links provided to you by a third-party, and having connectivity fault management and performance monitoring would help here a lot.
So, why should I care?
If you are a service provider who cares about customers’ experience (and boy, does caring about your customers pay!), an ability to measure service experience from their perspective, report on it, and especially take pre-emptive actions to prevent bad things from happening by catching early signs of trouble goes a really long way. Your old trusty friends there are good Service OAM (S-OAM) and Performance Management (PM).
The trouble is, these things often are quite resource-intensive, and attempting to centralise them could make that problem more acute.
If all you’re offering is a best effort service, “utility” or not, going the route of dumbing things down to save a buck may be a valid route, especially if that best effort service isn’t what makes the bulk of your company’s profits.
Otherwise, think through the possible trade-offs and weigh pros and cons for your particular situation carefully – don’t get blind-sided by a promise of savings. After all, value (as in “money that your vendors extract from you”) almost never “magically” disappears from the stack. It simply moves somewhere else, in some cases making you pay more dearly than before.