From the arcane knowledge dept: standby links in LAG

Summary for the impatient

The Link Aggregation Control Protocol (LACP) can be used to create link aggregation groups with active and standby links in them. Those standby links can be made active when one or more of the previously active links fail or removed from the group.

“The special magic” required for this to work needs only to be done by one side of LAG; the only requirement for the “other” side is that it supports standard LACP.

The story behind the post

A few days ago I attended an industry event, where I had a chat with a couple of senior technical peeps, looking after a line of certain Vendor’s server products.

As I am currently in the middle of orchestrating a deployment of an infrastructure project that involves these server products, I took an opportunity to get some “from the horse’s mouth” clarifications regarding network uplink connectivity. In particular, I wanted some comfort around the recommended configuration of the uplink connections, when using multichassis Link Aggregation Groups (LAG), because the upstream network equipment we use in this project is not from the same Vendor as the servers.

The point was that there are a couple of ways to do MC-LAG. There’s a “common” way, when all link members, irrespective of which upstream device they are connected to, are in the “active” forwarding state; and there’s an alternative, when link or links to one of the upstream devices are in the “active” state, while link or links to the other are kept in “standby”.

When I asked those gents what is the preferred operating mode for their product, “active-active” or “active-standby”, I was met with a look that’s typically reserved for those special awkward situations. You know, when you’re talking with a customer, and they say something which, um, isn’t really right. I tried to explain how the LACP is used to manage the active-standby arrangement, but it was obvious that it didn’t quite connect. The fact that it was quite a while ago that I read the standard and didn’t remember what the critical component things were called, didn’t help.

Anyway, at this point I realised that the bridge between the common and arcane has been crossed, and this post was born.

Digging a tiny bit deeper

I first learned about this neat trick when working with Alcatel-Lucent’s Service Routers (7750 SR). They use it as part of their MC-LAG solution. Here is a page that explains how their solution works. What it doesn’t cover, however, is how the active/standby member link status is managed.

For this, we need to refer to the IEEE 802.1AX. To work out what we are after, a few sections in the document need to be looked at.

A good place to start is the clause B.3, which provides a clue on how links land up being in “standby” state:

B.3 Standby link selection

Every link between systems operating LACP is assigned a unique priority. This priority comprises (in priority order) the System Priority, System ID, Port Priority, and Port Number of the higher-priority system. In priority comparisons, numerically lower values have higher priority.

Ports are considered for active use in an aggregation in link priority order, starting with the port attached to the highest priority link. Each port is selected for active use if preceding higher priority selections can also be maintained; otherwise, the port is selected as standby. (Emphasis added)

Boom. And so:

  • Higher-priority system (in case of MC-LAG, a distributed system) assigns priority to each link in a group such that links that it wants “active” will have higher priority, and those it wants “standby” will have lower priority (or priorities);
  • …and then it “agrees” with the lower priority system to activate only the links it needs using its MC-LAG-aware Selection Logic to decide “if preceding higher priority selections can also be maintained”, while keeping the links it wants as standby eligible for inclusion in the group (by assigning them the necessary Operational Keys).
  • In response to a link state change event, such as an active link failure, the MC-LAG-aware Selection Logic is executed again, activating a standby link.

And what is cool, the lower priority system does not need to know anything about MC-LAG-aware selection logic for this to work!

Hope this was helpful 🙂


About Dmitri Kalintsev

Some dude with a blog and opinions ;) View all posts by Dmitri Kalintsev

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: