Routing graceful restart
When an HA failover occurs, neighbor routers will detect that the cluster has failed and remove it from the network until the routing topology stabilizes. During that time the routers may stop sending IP packets to the cluster and communication sessions that would normally be processed by the cluster may time out or be dropped. Also the new primary unit will not receive routing updates and so will not be able to build and maintain its routing database.
You can solve this problem by configuring graceful restart for the dynamic routing protocols that you are using. This section describes configuring graceful restart for OSPF and BGP.
To support graceful restart you should make sure the new primary unit keeps its synchronized routing data long enough to acquire new routing data. You should also increase the HA route time to live, route wait, and route hold values to 60 using the following CLI command:
config system ha
set route-ttl 60
set route-wait 60
set route-hold 60
end
Graceful OSPF restart
You can configure graceful restart (also called nonstop forwarding (NSF) as described in RFC3623 (Graceful OSPF Restart) to solve the problem of dynamic routing failover. If graceful restart is enabled on neighbor routers, they will keep sending packets to the cluster following the HA failover instead of removing it from the network. The neighboring routers assume that the cluster is experiencing a graceful restart.
After the failover, the new primary unit can continue to process communication sessions using the synchronized routing data received from the failed primary unit before the failover. This gives the new primary unit time to update its routing table after the failover.
You can use the following commands to enable graceful restart or NSF on Cisco routers:
router ospf 1
log-adjacency-changes
nsf ietf helper strict-lsa-checking
If the cluster is running OSPF, use the following command to enable graceful restart for OSPF:
config router ospf
set restart-mode graceful-restart
end
Graceful BGP restart
If the cluster is running BGP only the primary unit keeps BGP peering connections. When a failover occurs, the BGP peering needs to be reestablished. This will happen if you enable BGP graceful restart which causes the adjacent routers to keep the routes active while the BGP peering is restarted by the new primary unit.
Enabling BGP graceful restart causes the FortiGate BGP process to restart which can temporarily disrupt traffic through the cluster. So normally you should wait for a quiet time or a maintenance period to enable BGP graceful restart. |
Use the following command to enable graceful restart for BGP and set some graceful restart options.
config router bgp
set graceful-restart enable
set graceful-restart-time 120
set graceful-stalepath-time 360
set graceful-update-delay 120
end
Notifying BGP neighbors when graceful restart is enabled
You can add BGP neighbors and configure the cluster unit to notify these neighbors that it supports graceful restart.
config router bgp
config neighbor
edit <neighbor_address_Ipv4>
set capability-graceful-restart enable
end
end
Bidirectional Forwarding Detection (BFD) enabled BGP graceful restart
You can add a BFD enabled BGP neighbor as a static BFD neighbor using the following command. This example shows how to add a BFD neighbor with IP address 172.20.121.23 that is on the network connected to port4:
config router bfd
config neighbor
edit 172.20.121.23
set port4
end
end
The FGCP supports graceful restart of BFD enabled BGP neighbors. The config router bfd
command is needed as the BGP auto-start timer is 5 seconds. After HA failover, BGP on the new primary unit has to wait for 5 seconds to connect to its neighbors, and then register BFD requests after establishing the connections. With static BFD neighbors, BFD requests and sessions can be created as soon as possible after the failover. The new command get router info bfd requests
shows the BFD peer requests.
A BFD session created for a static BFD neighbor/peer request will initialize its state as "INIT" instead of "DOWN" and its detection time asbfd-required-min-rx
* bfd-detect-mult
milliseconds.
When a BFD control packet with nonzero your_discr is received, if no session can be found to match the your_discr, instead of discarding the packet, other fields in the packet, such as addressing information, are used to choose one session that was just initialized, with zero as its remote discriminator.
When a BFD session in the up state receives a control packet with zero as your_discr and down as the state, the session will change its state into down but will not notify this down event to BGP and/or other registered clients.