What Are Sticky Clients?

One term you'll often hear banded about when talking with Wi-Fi professionals is "sticky clients". I thought it might be worth spending a few moments exploring what is meant by "sticky clients", why they are generally considered to be a bad thing in Wi-Fi networks and some approaches to mitigate them.


Background

Many folks dealing with Wi-Fi networks often talk about the "sticky" characteristics of wireless clients when discussing client roaming within a W-iFi network. In an ideal world, we'd like to have access points providing ubiquitous coverage across our desired area providing high-quality, consistent client coverage where-ever we go. In this wonderful land of rainbows and unicorns, our clients would gracefully roam from AP to AP, detecting and associating with their closest AP throughout the coverage area to ensure their best connection speed at all times.
Unfortunately, the real world doesn’t quite reflect this roaming-client-Nirvana and is altogether more unpredictable and ugly. Some clients will cling to an AP like limpets in a rock pool – hanging on for all they’re worth to the last AP they heard. We’ll have a look at why clients may behave in this manner, what the effects of this behaviour may be, and how we might mitigate it.


Sticky Clients


Problem Description

Imagine the following scenario: a large open office area. The office area is so large that several access points (let’s say, 10) are required to provide Wi-Fi coverage across the whole area. Let’s assume this would provide all employees in that office with the opportunity of a reasonably good level Wi-Fi signal, no matter where they are in the office (due the way the APs have been deployed evenly across the area).
Most people would (quite reasonably) expect that as employees move around the office with their WiFi client (e.g. A laptop or tablet), that it would be able to detect each nearest access point. As it moves in to range, the client would hopefully join each nearest AP, as it usually will have the best available signal in terms of received signal level. This is analogous to how your cell phone moves between cell towers as you drive along a highway, moving in and out of range of each tower along the route.
Unfortunately, many Wi-Fi clients do not exhibit this expected behaviour. Instead, they tend to hang on to the original access point they associated with, rather than moving to a nearby AP that would generally be a better choice for them.

Roaming Decision Process

Clients generally should monitor indicators of the health of their wireless connection, such as the signal strength (RSSI) of their connection, their signal to noise ratio and the number of errors/retries they are experiencing on that connection.  Once these indicators start to degrade, they should ideally start to probe for alternative access points, ready to make the jump to a new access point that will provide a better quality connection.
Unfortunately, there is no standardized definition of how a client will behave in this respect on a Wi-Fi network. You might expect that as the clients are subject to the standards of the 802.11 standard that there must be a defined standard to govern their behaviour. But, this is not the case – there is no standard that mandates client roaming behaviour. The roaming decision is purely determined by the client device itself and is not governed by a defined standard.
There is one fact that is worth repeating in that last paragraph: the roaming decision is a client decision, not a network decision. The access points and controllers do not tell a client when to roam – the network has to respect the wishes and behaviour of the client device. This means that if the client wishes to stay associated with one particular AP, there are few options available to the network to force a roaming decision.

Effect of Sticky Behaviour

On the face of it, if a client chooses not to make good decisions and chooses not to roam to an AP that will provide a better connection speed, then it’s just its own stupid fault and it will have to put up with the lower connection speeds that will ensue. However, things aren’t quite that simple – it has a much wider impact than just poor connections speeds for the “sticky” client. Consider the very simple example shown in the graphic below:

Figure 1 - Clients speed changes as a client moves away form an AP
A legacy 802.11a client joins AP 1 as a user walks in to a building. As it is in close proximity to the AP, it sees the AP at a high signal level and experiences low levels of frame loss and retransmissions. The client chooses to connect to AP1 and can achieve its full potential connection speed of 54Mbps.
As the user walks through the build and moves away from AP1, he/she approaches another AP : AP2. In an ideal word, we’d hope that our client has noticed that the signal level of AP1 is starting to drop off and that frame errors and retransmissions are starting to creep up. It would hopefully start to probe for new APs and perhaps associate with an AP that it can “hear” at a better level.
However, in many cases, the client will be quite happy with its established connection, but will just drop its connection speed to mitigate the rising frame errors and retransmissions. By moving to a lower speed it will be able to maintain its current connection, rather than having to move to a new AP.  This behaviour may be repeated as the client moves away from AP2 and moves closer to AP3. Again, instead of re-associating to a new AP, the client drops its speed to mitigate rising errors and retransmissions as the signal strength of AP1 drops due to the increased distance between the client and AP.
The client will eventually reach a signal level and/or level of errors and retransmissions that is deems unacceptable and make a move to a new AP, but it may be at a level at which its connection is in a significantly degraded state. This final decision point varies from client to client and is very often not even  published by client vendors.
We discussed earlier that the impact of this behaviour is wider than just the client that experiences the reduced connection speed caused byits poor roaming decision. If we consider the the lower transmission speeds that the client uses, we can perhaps start to understand the impact on other clients using the AP cell. 
A client that uses lower speeds takes longer to send its data, compared to one connected at a higher speed. As WLANs are a contended medium, this means that other clients who also wish to send data over the air have to wait longer to gain control of the wireless medium to send their data. A very simple example is shown in the graphic below:

Figure 2 - Effect of clients connected at lower speeds

We have 3 clients connecting to AP1. Clients 1 & 2 are close to the 802.11a access point and have a full connection speed of 54Mbps. However, client 3 has moved away from AP1 and has chosen to remain connected to AP1, despite having the better choices of AP 2 & 3 available. To contend with rising error rates due to the lower signal levels it experiences from AP1, client 3 has dropped its connection speed to 24Mbps.
Consider that each client has 10Mbytes of data ready to send. The clients have to contend for access to the AP to send their data – they effectively “take it in turns” to send their data. If client 3 sends its data first, as it is only sending at 24Mbps it is going to take more than twice as long to send its 10Mbytes of data, compared to clients 2 & 3 (which both use a full 54Mbps connection). Clients 2 & 3 will have to wait more than twice as long as they would if client 1 was using a full speed 54Mbps connection. This reduces the overall efficiency of the cell, as clients are waiting longer than they should for a slower speed client to send its data. Things can get far worse if clients are allowed to drop to very low 802.11a connection speeds of 12 or 6Mbps.
The key to high performance Wi-Fi networks is airtime efficiency. As the wireless medium is contended (i.e. shared), we want wireless clients to send their data as quickly as possible and to free up the airspace as quickly as possible, ready for the next client who has data to send. Even a small number of sticky clients, using sub-optimal speeds, can very quickly drag down the performance of an AP cell.

Reasons for Sticky Behaviour

This behaviour seems, on the face of it, to be a very counter intuitive way to behave when we’re trying to get clients to play nicely on an Enterprise WLAN. It’s infuriating that clients exhibit this behaviour that causes such a negative impact on the operation of our WLAN.  But, if we look at the clients reasoning on this subject, the behaviour starts to make a little more sense.
Many clients are designed for, or assume they will operate in, the home network environment. They are consumer-class devices, not specifically designed for the Enterprise. Their expectation is that they will have a single access point deployed somewhere in a home and they better make sure they can contact that AP, no matter what it takes. If you think of your own home network (unless you’re a Wi-Fi geek), you’ve likely got a single AP/router that is provided by your cable or broadband provider. As you move around your home, the signal level experienced is going to get quite challenging in particular areas of your home (yeah, let’s face it, we all take the iPad in to the bathroom to read).
As the client assumes it is likely to have only this single AP to work with, it is going to employ all measures it can to stick to that AP. This is great in the home environment – not so great in the Enterprise…
Now that we perhaps understand the psyche of the typical wireless client a little more clearly, perhaps we can lose some of our anger at its behaviour on our Enterprise wireless network and decide on the best way to deal with it. (The idea of a hammer and your average Mac does have a certain appeal though…)


Sticky Client Mitigation

So, how do we try to mitigate this sticky behaviour? We have 3 choices:
  • Give the client some friendly advice to try to encourage better roaming decisions
  • Make life difficult for the client to encourage better roaming decisions
  • Slap the client in the face to force a roaming decision


Friendly Advice

Some clients will actually take advice from access points about alternative APs that they might consider when it comes to roaming decisions. Two 802.11 standard amendments can be employed to provide information about alternative APs that a client may consider. These are particular features within the 802.11k and 802.11v amendments. One caveat to bear in mind is that not all clients may support these standard amendments (and you’ll also need to check your WLAN gear also supports them).

802.11k

802.11k is the Radio Resource Management 802.11 standard amendment. It includes a range of mechanisms for performing various measurements of the WLAN station’s environment and allows a client to request information about that environment. One of the most useful mechanisms from a client roaming perspective is the Neighbor Report. A Neighbor Report is requested by a client to obtain a list of the APs that its current AP knows about. Having this information significantly improves a client’s ability to make a roaming decision.
Without a Neighbour Report, a client has to either passively or actively scan for alternative APs. Passive scanning involves periodically going off channel and listening for beacon frames from nearby APs. Active scanning requires a client to go off-channel and send probe requests out to any APs in the vicinity to see who responds. From this probing activity, a client can build a list of potential APs that it may roam to.
Scanning across all channels is a costly business in terms of time for the client. When considering the 5GHz band which may have 20+ channels to scan, a client could spend quite a while scanning before finding a suitable AP – it may even run out of time to scan all channels before being forced in to a (sub-optimal) roaming decision. With a neighbour report from the AP, the client can get very similar information in a single frame. This is far more efficient and assists our client in making roaming decisions in a much shorter time frame. With information more readily available, the client is perhaps likely to make better informed decisions, more often.
The use of 802.11k is dependent on client support for this feature, but it is becoming well supported amongst newer mobile devices. Often, it may also need to be explicitly enabled on your Wi-Fi infrastructure equipment. 

802.11v

802.11v is the Wireless Network Management amendment to the 802.11 standard. It defines a service that allows stations on a WLAN (APs and clients) to exchange data that provides them with awareness of network conditions.
One of the mechanisms provided in 802.11v is ‘BSS Transition Management’. This mechanism allows an access point to request that a client transitions to a specific AP, or to supply a set of preferred APs.  This mechanism can again provide our client with improved roaming decision data to facilitate better roaming decisions.
Support amongst clients for 802.11v doesn’t seem to be quite so widespread as 802.11k at the time of writing, but it could still provide a useful option for those that support it. Again, it is a feature that will generally explicitly need to be enabled on your Wi-Fi infrastructure gear.

Information Only

As we’ve stated previously, despite all of the good information that the 802.11k and 802.11v amendments provide, this is only advisory information – the final decision rests with the client, who may still choose to ignore this useful information.
Remember, even though our WLAN infrastructure may be able to provide this information to WLAN clients, not all clients will support the 802.11v & 802.11k amendments. Support of amendments such as 8021.11k & v is optional for Wi-Fi clients. Check client specification data to understand which amendments may be supported (Wi-Fi Alliance certification pages are a great place to start).

Making Life Difficult

If we can’t get our clients to roam nicely through standardized mechanisms, then we have to get a little more persuasive and start to make life a little more difficult. We need to employ a technique to make it difficult for a client to remain associated as it moves further away from an AP.
As the distance between a client and access point increases, the signal level of the AP received by the client reduces, along with signal quality. This generally means connection error rates will start to increase as signal quality decreases.  To compensate for the rise in errors, a client will drop its connection speed to a lower rate, which will be more tolerant of the lower signal level and bring its error rate back up to an acceptable level.  This mechanism will be repeated a number of times as the client moves ways form the AP, dropping through the connection speeds supported by the AP until the lowest speed is reached. Once the lowest supported rate is reached and the error rate starts to rise, the client will have no option but to try to find an alternative AP to associate with.
Not all clients will stick to the same AP until they have no other choice, they may decide to roam much earlier in the cycle of stepping through the lower rates. But a particularly sticky client may get to fairly low speeds before deciding to move on.
 We can use this mechanism to our advantage to encourage clients to roam at higher rates. We can ensure that they don’t have the opportunity to hit lower rates (and hence cause the inefficiency that lower rates bring to an AP cell). By switching off support for lower connection rates that are generally available from an AP, a client simply does not have the option to use lower rates and is forced to roam to another AP much sooner.
For instance, many APs will, by default, support connection rates all the way from very high-end speeds (e.g. perhaps 866Mbps for an 802.11ac AP) down to very low legacy speeds (e.g. 6Mbps on 5GHz, or 1Mbps on 2.4GHz). If we over-ride the support for the lower rates so that they are not available, a client simply cannot “hang on” to an AP over such a large area. Note that support for the lower rates is often enabled by default  on Wi-Fi infrastructure gear and need to be explicitly disabled.
A common approach is to disable lower speeds so that perhaps the lowest support speed on 2.4GHz is perhaps 12Mbps or 24Mbps. For 5GHz, similar minimum speeds are often selected. A client will be aware from AP beacon information which rates an AP supports, so will know that it has to find a new AP once it reaches the minimum supported rates (e.g. 12Mbps) – it will have to roam much sooner than if it had the option to drop to 1Mbps.


Figure 3 - Client remaining on same AP at low supported basic rate

Figure 4 - Client being forced to seek alternative AP due to higher supported basic rate

This technique is very effective and is widely used. Many WLAN vendors recommend disabling lower data rates as a matter of best practice. Generally, disabling rates below 12 or 24mbps is recommended in higher density deployments, though this needs to be tested in your environment. (Note that these rates affect the achievable rates of  management traffic across the cell, rather than actual data connection speeds)

Getting Tough With Clients

If clients just won’t take the hint and are still determined to hang on to their AP, it’s time for the nuclear option. It’s time to get tough.
If we can’t coerce a client to make the jump to a better choice AP by limiting its connection speed options or by giving it friendly advice, then we have two options left:
  • Simply stop talking to the client
  • Kick the client off the AP
Although these may seem like extreme measures, as the late, great Spock once said: “Logic clearly dictates that the needs of the many outweigh the needs of the few”. Sometimes it’s better to cause a little discomfort to an uncooperative client to ensure the better performance that will be afforded to other clients on the same AP cell.

RSSI Threshold

One of the first mechanisms introduced by WLAN vendors to “get tough” with clients was the concept of an RSSI threshold. In simple terms, if clients fall below a particular RSSI level (as viewed from an AP’s point of view), then simply stop talking to them. They will then be forced to use an alternative AP.
For example, if we assume that any clients that an AP sees at an RSSI of -80dBm or less are probably too far away from that AP (and they should be moving to an alternative), we set an RSSI threshold of -80dBm. As the client moves away from the AP, once the AP hears it at a level below -80dBm, it simply ignores it. The client eventually figures out that its AP has “gone” and has to look for a new one.
Although this method is great at “pruning” clients from an AP cell, it’s worth considering a couple of points. Not all clients operate at the same transmit power level. A laptop is generally going to run at a higher transmit power level than a smartphone.  Although your chosen RSSI level may work well for laptops, you may end up cutting off smartphones far more often, which may not have as many (or any) alternative APs to move to.
Also, it’s worth considering the effect on the wireless driver and applications running on the client device. They have no indication that they are no longer connected to an AP for (potentially) quite a long period of time. If an AP simply stops talking to a client, there may be quite few attempts to re-establish communications with the AP before giving up.  This is obviously not a great situation for the client, or the applications it may be running.
(Note: In the Cisco world, this feature is “Rx-SOP” on AireOS WLCs)

RSSI Threshold with De-authentication

As we hinted in the previous section, simply ignoring a client below a particular RSSI  level may not be the best approach. Although we may force a roam, it may take the client quite a while to figure out what’s going on and come up with an alternative option. If clients are roaming often (e.g. perhaps in a public venue such as a stadium) this could be a real issue.
Another technique is to couple to RSSI threshold technique described previously with a brute-force de-authentication of the client once it reaches an target RSSI threshold.
Although this seems a brutal approach (we’re literally throwing the client off the network), it does provide a fast, positive confirmation to the client that they are no longer associated with the network. The client no longer has to wait for an extended period to find the out that an AP is no longer available (as was the case in the previous section).  If the client knows sooner that it has to find a new AP, it can re-establish a connection more quickly than with the previous RSSI threshold technique.
At least a couple of vendors make this technique available: Aruba use the technique as part of their ‘Clientmatch’ feature, and Cisco use it as part of their ‘Optimized Roaming’ feature.

Figure 5 - Getting tough with clients

Should I Get Tough With My Clients?

The two techniques we have discussed in this section that involveRSSI threshold enforcement  should be considered as “specialist” techniques. They should be used as a last resort for specialized applications (e.g. high density public events).
If you are using any sort of latency sensitive applications, then RSSI threshold techniques should not be used – the latency introduced in to the enforced roaming is going to hit your applications (e.g. voice applications do not take kindly to simply being thrown off the network).

Summary

We’ve covered quite a bit of ground in this article, looking at why clients may be “sticky” and how we might mitigate their behaviour. We looked at:
  • What we mean by the “sticky” behaviour of clients
  • How clients make roaming decisions
  • What the effects of sticky client may be on other clients in the same AP cell
  • Why clients may exhibit sticky behaviour
  • Possible approaches to mitigate sticky behaviour, including:
    • Giving clients better information through standards-based approaches (802.11k & 802.11v)
    • Limiting lower data rates to encourage roaming
    • Using vendor-supplied techniques which rely on actions based on received RSSI signal levels
One caveat worth highlighting is that any of the mitigation approaches we have discussed should be used with care on an existing, live wireless LAN.  When making any changes that may affect client behaviour, there may be unintended consequences. Apply changes with care and, perhaps assessing the effect on a test  network or a subset of your network before rolling out across the entire network.

References

The following references may be useful for further consultation to find out more about the topics covered in this article:

Popular posts from this blog

The 5GHz “Problem” For Wi-Fi Networks: DFS

Microsoft NPS as a RADIUS Server for WiFi Networks: SSID Filtering

Microsoft NPS as a RADIUS Server for WiFi Networks: Dynamic VLAN Assignment