Wi-Fi Roaming Analysis (Part 1 – Connection Control)

Advanced protocol analysis is becoming an increasingly important skill for Wi-Fi engineers as networks grow increasingly sophisticated and complex. The wireless LAN market is a tremendously innovative and fast-changing landscape, and the skills necessary to understand and dissect their inner workings are highly valuable.

One of the most important aspects of building a successful enterprise wireless LAN is ensuring adequate Wi-Fi roaming performance. However, Wi-Fi roaming is a complex subject due to the many variations of Wi-Fi security found in the marketplace and the historical difficulty in being able to easily gather and analyze roaming data.

In this series I will provide an overview of Wi-Fi roaming, how it works, and provide readers with guidance on how to capture, measure, and analyze wireless roaming performance of clients within their own environments. In addition, I’ll highlight a few professional tools and tricks of the trade to make this process simpler than manual analysis.

Wi-Fi Roaming Definition
Roaming, in the context of an 802.11 wireless network, is the process of a client moving an established Wi-Fi network association from one access point to another access point within the same Extended Service Set (ESS) without losing connection (e.g. within a defined time interval, usually in the range of a few seconds).

It is also helpful to distinguish between different wireless connection scenarios that may occur. Delineation will provide a better understanding of how and when each scenario will occur, why variations in performance between scenarios exist, and aid in establishing performance baselines.

  • Initial Connection – The client has no previous 802.11 association to the ESS (any AP advertising the same SSID). This situation requires the client to perform all required connection and authentication steps defined in the network policy before network access is achieved. The time required for a client to perform an initial connection will be the same as wireless roaming unless fast roaming or session caching techniques are implemented. The length of time required to complete full 802.1X authentication in secure wireless environments is considerably longer than in open or pre-shared key (PSK) networks, making implementation of fast roaming techniques highly desirable. It may even be required depending on the network architecture and applications implemented (e.g. branch / remote office networks with central RADIUS across the WAN increase the time to complete EAP authentication and can render real-time voice applications unusable).
  • Wireless Roaming – The client has an established 802.11 association to an infrastructure AP and migrates its connection within the same ESS to another AP. Association to the new AP terminates the previous AP association either implicitly or explicitly (only one association is allowed at a time, per the 802.11 standard). The goal of a wireless roam is to identify an alternate AP that can provide better service to the client than the current AP.Wireless client roaming algorithms are typically optimized to minimize the time required to transition between APs in order to avoid network access disruptions to client applications. This can be accomplished through fast roaming or session caching techniques that eliminate some of the authentication steps. Fast roaming can only occur after an initial connection has been performed to ensure the client has successfully completed all required authentication and authorization required by the network policy.
  • Connection Termination & Re-Establishment – The client has an established 802.11 association, but the performance severely degrades to the point that the connection is rendered unacceptable. The client and/or AP is required to recognize the degraded connection, which may not be explicitly apparent, then terminate and re-establish a connection from scratch. A connection could degrade for a number of reasons, including interference, multipath (with older 802.11a/b/g clients), excessive packet error rate, out of range, roam not completed within the client’s time threshold, etc.When analyzing client roaming events it will be necessary to determine if the client performed a wireless roam or if it terminated and re-established its network connection. A terminated connection requires solutions to remediate underlying issues affecting network stability, versus the focus of wireless roaming which is to improve performance.

Additionally, identifying which situation is occurring can be incredibly valuable when performing protocol analysis and troubleshooting in order to determine what may be occurring with a client network connection when the client cannot be directly observed (e.g. remote troubleshooting).

Connection Control
Wi-Fi network connection establishment and roaming is decentralized, being controlled almost entirely by the client. The 802.11 standard explicitly places control of wireless connection establishment in the hands of clients by defining various logical services and breaking implementation out between clients and access points.

Think of the AP as a hotel concierge: "Welcome to the Distribution System! You're requested Association is ready."

Furthermore, the access point is responsible for association services in order to inform the broader network of the STA to AP mapping, and for data delivery between stations across the network. This mapping is also the reason why an 802.11 client station can only be associated to a single AP at a time to ensure that the network can deliver data to the correct AP.Some of these services require integration with external networks (e.g. the distribution system [DS] outside the basic service set [BSS]), which is not defined by the 802.11 standard but is typically an 802.3 wired Ethernet network. These services are only implemented in wireless access points, and include association and dis-association services among others. It is important to understand that although APs provide association services for client stations, it is the client station that invokes the association process. It may be difficult to conceptualize how client stations control connection establishment when the association service is only implemented within APs. However, remember that the 802.11 standard defines “services”, and the AP provides the association service for the client who invokes the service.

Infrastructure Influence
Wi-Fi infrastructure vendors have developed proprietary features to influence client behavior. One example of this is the Cisco Compatible Extensions (CCX) program which includes AP assisted roaming through neighbor reports, fast roaming enhancements, RF scanning, client reporting, and roaming diagnostics. Another example is the band-steering feature provided by many vendors, which typically works by delaying probe responses to dual-band clients in order to influence them to join a 5GHz BSS instead of 2.4GHz BSS (otherwise many clients “stick” to 2.4GHz with high prejudice, although manufacturers are starting to change this preference due to the increasing prevalence of 5GHz Wi-Fi networks). Finally, the IEEE has standardized a set of radio resource enhancements with the 802.11k amendment that allows the infrastructure to send “Neighbor Reports” to the client to aid the client scanning and roaming decision. See the CWNP whitepaper on RSN Fast BSS Transition (free registration required) for more information on 802.11k and neighbor reports.

Proprietary Client Implementations
Since the connection is controlled by the client station, it typically relies on an internal algorithm developed by the manufacturer to determine when a wireless roam should occur. Client roaming algorithms are not standardized and are proprietary intellectual property of each manufacturer. This results in highly variable client roaming performance based on manufacturer implementation approaches and variations.

However, from a high level perspective, all client stations typically perform the same general steps when roaming, which includes:

  1. Passive / Active scanning in the background to identify other APs that are within range
  2. Client roam triggers (exact algorithms are vendor proprietary, but are commonly based on signal strength thresholds, RSSI heuristics between APs, data rate shifting, retry and error rates)
  3. Active scanning to confirm the new AP is still available
  4. Roam to the new AP

Comparison to Cellular Networks
For comparison, consider connection control similarities and difference between Wi-Fi roaming and cellular handover mechanisms. Cellular networks may implement a variety of handover protocols to transfer a mobile station between source and target cells, ranging from network-controlled to mobile-station-controlled depending on the standard being implemented (AMPS, CDMA, GSM, etc). Modern cellular networks typically rely on decentralized handover, similar to Wi-Fi, but define key enhancements to ensure connection reliability. Soft-handover in CDMA networks allows a mobile station to establish a connection to the target cell before breaking the connection to the source cell, thereby reducing the chance of service disruption. Standards such as 3GPP, which defines GSM and LTE networks, specifies that handover triggering (section III) is defined by the network core but implemented by mobile stations (user equipment) to improve consistency and performance. Finally, rigorous and thorough testing of every mobile phone is performed by mobile network operators (MNO)  before certification is granted for activation on their networks (the GCF is one example).

Note – Wi-Fi roaming is most comparable to cellular handover. In contrast, cellular roaming refers to service acquisition outside of the subscriber’s home location or network provider, and should not be confused with Wi-Fi roaming.

Wi-Fi engineers should take away a few concepts from this comparison. First, soft-handover is likely not realistic for Wi-Fi networks due to typical enterprise multi-channel architectures based on frequency division of adjacent APs (similar to GSM). Second, standardized handover triggering is within the realm of possibility, and the central definition of trigger mechanisms is feasible with modern coordinated Wi-Fi architectures (typically involving a controller, but not required). However, the need for such standardization will need to become much more apparent before action by the IEEE or Wi-Fi Alliance is considered. Perhaps the industry will begin talking about such measures as MNOs take more prominent roles within the Wi-Fi standard and certification processes due to carrier Wi-Fi adoption.

Perhaps the most important takeaway is the approach to endpoint certification implemented by mobile network operators. By taking control of endpoint certification prior to activation and use on the network, MNOs more tightly control their network ecosystem to achieve desired performance levels. Wi-Fi networks will never be able to achieve such levels of control due to the use of unlicensed spectrum. However, Wi-Fi network administrators can (and should) implement similarly rigorous client testing and verification procedures to optimize network performance.

Importance of Wi-Fi Roaming Analysis
Consider – modern wireless networks require high performance to concurrently support voice, data, and real-time video, high capacity Wi-Fi to support an influx of mobile Internet devices, and ultra-low latency performance to support vertical industry solutions such as automated warehouses, robotics, and medical instrumentation.

Wi-Fi network design and optimization is a complex undertaking, with numerous features, configuration options, and environmental variables that can make achieving a high performance network difficult. Roaming analysis provides insight into how decisions made on wireless architecture, network design, client selection, and configuration impact overall network performance.

Performing Wi-Fi roaming analysis will enable network architects and engineers to:

  1. Baseline current client roaming performance
  2. Analyze gaps between current network performance and application requirements
  3. Identify opportunities to improve and optimize performance
  4. Implement changes to infrastructure and client devices to optimize performance
  5. Take more active control to ensure network performance matches desired service levels

Be sure to check back in for the next article in this series which will cover the complexity brought about by security protocols and the many resulting variations of wireless roaming.


Many thanks to Marcus Burton at CWNP for technical review and contribution to this post!

Previous post

Nintendo vs. Cisco

Next post

Configuring Cisco WCS CleanAir

Andrew vonNagy

Andrew vonNagy

Technical Architect at a Fortune 50 Company CCIE #28298 CWNE #81

1 Comment

  1. Jose
    December 6, 2014 at 4:17 am — Reply

    Hi Andrew,

    Didn’t you ever complete these series? I found it very interesting and useful.


Leave a Reply to Jose Cancel reply