Cisco WLC N+1 Redundancy - APs Not Joining Redundant Controller
Just thought I'd post up a gotcha I hit today around Cisco N+1 redundancy.
In summary I had a primary Cisco 5008 WLC (AIR-CT5508-50-K9) with a 5508 HA WLC (AIR-CT5508-HA-K9). I set it up for N+1 redundancy as per the Cisco guidelines (note HA, not SSO):
http://www.cisco.com/c/en/us/td/docs/wireless/technology/hi_avail/N1_High_Availability_Deployment_Guide/N1_HA_Overview.html
Both WLCs were running 7.4.121.0 code.
The APs joined the primary controller as expected with no problems. However, when I failed the primary WLC, the APs would not join the secondary. A debug of CAPWAP events on the HA controller revealed the following messages:
*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:53.675: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:34:55.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:34:59.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:35:07.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53527
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53527
After lots of re-checking of the configuration and head-scratching I called a colleague for inspiration. He advised me he had seen recently a similar issue. The answer: a reboot of the HA WLC.
...it always seems obvious in hindsight.
In summary I had a primary Cisco 5008 WLC (AIR-CT5508-50-K9) with a 5508 HA WLC (AIR-CT5508-HA-K9). I set it up for N+1 redundancy as per the Cisco guidelines (note HA, not SSO):
http://www.cisco.com/c/en/us/td/docs/wireless/technology/hi_avail/N1_High_Availability_Deployment_Guide/N1_HA_Overview.html
Both WLCs were running 7.4.121.0 code.
The APs joined the primary controller as expected with no problems. However, when I failed the primary WLC, the APs would not join the secondary. A debug of CAPWAP events on the HA controller revealed the following messages:
*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:43.679: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:43.680: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53528
*spamApTask2: Mar 17 12:34:53.675: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:34:55.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:34:59.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask2: Mar 17 12:35:07.674: 00:0f:24:2b:04:c2 DTLS connection not found, creating new connection for 192:168:20:2 (53528) 192:168:1:19 (5246)
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Request from 192.168.1.1:53527
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Join Priority Processing status = 0, Incoming Ap's Priority 1, MaxLrads = 500, joined Aps =0
*spamApTask1: Mar 17 12:35:53.762: 1c:1d:86:xx:xx:xx Discovery Response sent to 192.168.1.1:53527
After lots of re-checking of the configuration and head-scratching I called a colleague for inspiration. He advised me he had seen recently a similar issue. The answer: a reboot of the HA WLC.
...it always seems obvious in hindsight.