 |
Non-Stop Routing Gaining GroundBy Mark Seery, Vice President. In 2002, the conventional wisdom was that non-stop forwarding (aka "graceful restart") would be the best approach to achieve high-availability. (See "IP High Availability-The Journey is Under Way," Insight, August, 2002.) Back then, Cisco led the charge on non-stop forwarding and demonstrated a compelling capability that was a big improvement on what routers had traditionally been able to do. Other vendors, such as Juniper and Redback, soon followed. There was a split in the industry. Companies like Alcatel, Avici, Chiaro, and Pluris (no longer with us) were pursuing non-stop routing (aka "stateful failover"). These companies were concerned that in a non-stop forwarding scenario, disruption to services could not be prevented and that while the backup control plane was restarting, forwarding might continue along the wrong path if network topology changes occurred during the restart. Also at that time, there was the potential that the existing installed base of Cisco 7xxx edge routers might not react well to the extra processing requirements imposed on them by the neighbor-assisted learning process of non-stop forwarding. (In fact, some modifications to the BGP4 update process were made to pace it better and to make it more efficient). Pursuing non-stop routing was very controversial in 2002, as many engineers believed the approach could not scale. In 2005, we still don't know whether or not non-stop routing can scale in real-world production environments, but BT Exact recently tested Alcatel's implementation on the 7750 and said the following: The results clearly demonstrate a significant reduction in switchover times (ten times faster than competitive edge router platforms) and a minimization or elimination of service interruption in the event of an active to standby Switching Fabric/Control Processing Module (SF/CPM) switchover. The times for software and hardware induced switchover events were measured to be <2.3µs and <4.1ms, respectively, for supported protocols and services. All peerings and adjacencies, including those to 3rd party router platforms, were unaffected by the switchover events. No protocol extensions or helpers were required. The testing included large-scale configurations as well as active and dynamic routing to simulate full-loaded conditions. In one MPLS/BGP VPN test that BT Exact ran, there were 1,001 BGP4 peering sessions. We still don't know if this approach scales in the real world, but we have data points that it scales in lab tests. In reviewing some of the newer SIP-based architectures for voice, if you use stateful protocols, you are probably condemning yourself to a stateful failover approach. TCP, which BGP4 uses, is stateful. In addition, a topology/path database is a type of state as well. Packet Design recommended a way of not using TCP a few years back that would have removed some of this problem, but even if that was done the topology data is state. As much as the end-to-end principle prescribes no state in a network, the current approach to routing always creates state, which is something to think about as we look at future network architectures. Is graceful restart fatally flawed in light of stateful protocols and topology? Not fatally, but putting extra processing burden on neighbors is not good unless you can guarantee that the processing load won't cause other problems. In addition, technology has to allow the implementation to restart the new control plane before topology changes occur and adding even more interoperability issues just raises the bar to innovation. Non-stop forwarding is a tremendous improvement, but the question remains about whether or not it is sufficient. The Bottom LineHigh availability and reliability are maturing and required for all services whether residential or business. Non-stop routing is gaining more support by some vendors, and all router vendors should look at it closely. Mark Seery has 22 years of industry experience, covering both private and public networks, in the areas of network operations, software development, product management, and product marketing, with an interest in network architectures for all aspects of public networks. Previous to Ovum-RHK, Mark spent 10 years in senior product management/marketing roles for a range of start-ups and established companies, including Valo Systems, Bravara Communications, BroadBand Technologies/Pliant Systems, Cisco Systems, Wandel & Goltermann, and NetEdge Systems.
|
 |