Making Redundancy Real

There’s a difference between having redundancy… …and actually using it.

I came across this in a London site where around 350 users had recently been moved into a new office space. On paper, the network design was solid.

Two core switches.
Multiple VLANs for data and voice.
Spanning Tree in place.
HSRP providing gateway resilience.

Everything you’d expect. But in practice, it wasn’t behaving the way the design intended. All traffic — across all VLANs — preferred a single core. The same switch was:

the Spanning Tree root
the HSRP active node

Which meant every packet flowed through it.

Technically, there was failover.

If that core died, the other would take over. But until that happened, half the infrastructure was effectively idle. The goal wasn’t to redesign anything. It was to make better use of what was already there.

The approach was simple in principle.

Split responsibility across the two cores. Let one core take ownership for some VLANs, and the other take the rest. Specifically:

Move the Spanning Tree root for certain VLANs to the secondary core
Move the HSRP active role for those same VLANs to match

So instead of one core doing everything, both cores would actively handle traffic. From the change plan:

“The purpose of this work is to split the VLAN traffic across both cores equally”

In practice, that meant adjusting two key things.

HSRP.

The active gateway is determined by priority, so by lowering priority on one core and raising it on the other, you can control which device becomes active for each VLAN.

Spanning Tree.

By shifting the root bridge for specific VLANs, you control the path traffic takes through the network.

The important part is alignment.

If your HSRP active gateway lives on one switch, but Spanning Tree prefers a path through another, you create inefficiency — traffic crossing links unnecessarily so the two need to agree. Once aligned, the effect is immediate. Traffic naturally splits.

Half flows through one core.
Half through the other.

No new hardware. No major redesign. Just better use of what was already in place. There is, of course, a moment of tension when you do this.

Spanning Tree recalculates.
Ports transition states.
Traffic pauses briefly.

From the plan:

“The switch… will not forward any data… for a few seconds”

That’s expected. What you’re really watching for is something worse:

Loops.

That was the real risk.

“Spanning-Tree loop — low probability, highly disruptive”

So the process was controlled.

Backups taken.
Changes applied incrementally.
Verification at each step.
Checking root bridge status.
Checking HSRP roles.
Making sure reality matched expectation.

Once complete, the network didn’t look dramatically different.

No new features.
No visible change to users.

But underneath, it behaved better.

Load was shared.
Failure domains were cleaner.
Both cores were doing real work.

Looking back, this wasn’t about Spanning Tree or HSRP. It was about intent. Understanding how different parts of a system interact, and making sure they’re aligned towards the same outcome.

Because redundancy on paper is easy. Making it real is where the work is.

Series: making-systems-predictable

← Previous Next →

David Dickinson