Failover protection
The FortiGate Clustering Protocol (FGCP) provides failover protection, meaning that a cluster can provide FortiGate services even when one of the devices in the cluster encounters a problem that would result in the complete loss of connectivity for a stand-alone FortiGate unit. Failover protection provides a backup mechanism that can be used to reduce the risk of unexpected downtime, especially in mission-critical environments.
FGCP supports failover protection in four ways:
- If a link fails.
- If a device loses power.
- If an SSD fails.
- If memory utilization exceeds the threshold for a specified amount of time.
When session-pickup is enabled in the HA settings, existing TCP session are kept, and users on the network are not impacted by downtime as the traffic can be passed without reestablishing the sessions.
When and how the failover happens
1. Link fails
Before triggering a failover when a link fails, the administrator must ensure that monitor interfaces are configured. Normally, the internal interface that connects to the internal network, and an outgoing interface for traffic to the internet or outside the network, should be monitored. Any of those links going down will trigger a failover.
2. Loss of power for active unit
When an active (primary) unit loses power, a backup (secondary) unit automatically becomes the active, and the impact on traffic is minimal. There are no settings for this kind of fail over.
3. SSD failure
An HA failover can be triggered by an SSD failure.
To enable an SSD failure triggering HA fail over:
config system ha set ssd-failover enable end
4. Memory utilization
An HA failover can be triggered when memory utilization exceeds the threshold for a specific amount of time.
Memory utilization is checked at the configured sample rate (memory-failover-sample-rate
). If the utilization is above the threshold (memory-failover-threshold
) every time that it is sampled for the entire monitor period (memory-failover-monitor-period
), then a failover is triggered.
If the FortiGate meets the memory utilization conditions to cause failover, but the last memory triggered failover happened within the timeout period (memory-failover-flip-timeout
), then the failover does not occur. Other HA cluster members can still trigger memory based failovers if they meet the criteria and have not already failed within the timeout period.
After a memory based failover from FortiGate A to FortiGate B, if the memory usage on FortiGate A goes down below the threshold but the memory usage on FortiGate B is still below the threshold, then a failover is not triggered, as the cluster is working normally using FortiGate B as the primary device.
When you disable memory based failover, a new HA primary selection occurs to determine the primary device.
To configure memory based HA failover:
config system ha set memory-based-failover {enable | disable} set memory-failover-threshold <integer> set memory-failover-monitor-period <integer> set memory-failover-sample-rate <integer> set memory-failover-flip-timeout <integer> end
memory-based-failover {enable | disable} |
Enable/disable memory based failover (default = disable). |
memory-failover-threshold <integer> |
The memory usage threshold to trigger a memory based failover, in percentage (0 - 95, 0 = use the conserve mode threshold, default = 0). |
memory-failover-monitor-period <integer> |
The duration of the high memory usage before a memory based failover is triggered, in seconds (1 - 300, default = 60). |
memory-failover-sample-rate <integer> |
The rate at which memory usage is sampled in order to measure memory usage, in seconds (1 - 60, default = 1). |
memory-failover-flip-timeout <integer> |
The time to wait between subsequent memory based failovers, in minutes (6 - 2147483647, default = 6). |
Configuring HA failover time
On supported models, the HA heartbeat interval unit can be changed from the 100ms default to 10ms. This allows for a failover time of less than 50ms, depending on the configuration and the network.
config system ha set hb-interval-in-milliseconds {100ms | 10ms} end
In this example, the HA heartbeat interval unit is changed from 100ms to 10ms. As the default heartbeat interval is two, this means that a heartbeat is sent every 20ms. The number of lost heartbeats that signal a failure is also changed to two. So, after two consecutive heartbeats are lost, a failover will be detected in 40ms.
To configure the HA failover time:
config system ha set group-id 240 set group-name "300D" set mode a-p set hbdev "port3" 50 "port5" 100 set hb-interval 2 set hb-interval-in-milliseconds 10ms set hb-lost-threshold 2 set override enable set priority 200 end