Monday, 17 March 2014

Cisco WLC N+1 Redundancy - APs Not Joining Redundant Controller

Just thought I'd post up a gotcha I hit today around Cisco N+1 redundancy.

In summary I had a primary Cisco 5008 WLC (AIR-CT5508-50-K9) with a 5508 HA WLC (AIR-CT5508-HA-K9). I set it up for N+1 redundancy as per the Cisco guidelines (note HA, not SSO):

http://www.cisco.com/c/en/us/td/docs/wireless/technology/hi_avail/N1_High_Availability_Deployment_Guide/N1_HA_Overview.html

Both WLCs were running 7.4.121.0 code.

The APs joined the primary controller as expected with no problems. However, when I failed the primary WLC, the APs would not join the secondary. A debug of CAPWAP events on the HA controller revealed the following messages:

*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53528

*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528

*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528

*spamApTask2: Mar 17 12:34:53.675: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)

*spamApTask2: Mar 17 12:34:55.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)

*spamApTask2: Mar 17 12:34:59.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)

*spamApTask2: Mar 17 12:35:07.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)

*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53527

*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53527

After lots of re-checking of the configuration and head-scratching I called a colleague for inspiration. He advised me he had seen recently a similar issue. The answer: a reboot of the HA WLC.

...it always seems obvious in hindsight.