Modifying heartbeat timing
If the FortiGate-7000Fs in the HA cluster do not receive heartbeat packets on time, the FortiGate-7000Fs in the HA configuration may each determine that the other FortiGate-7000F has failed. HA heartbeat packets may not be sent on time because of network issues. For example, if the HA heartbeat communications links between the FortiGate-7000Fs become too busy to handle the heartbeat traffic. Also, in a distributed clustering configuration the round trip time (RTT) between the FortiGate-7000Fs may be longer the expected time between heartbeat packets.
In addition, if the FortiGate-7000Fs becomes excessively busy, they may delay sending heartbeat packets.
Even with these delays, the FortiGate-7000F HA cluster can continue to function normally as long as the HA heartbeat configuration supports longer delays between heartbeat packets and more missed heartbeat packets.
You can use the following commands to configure heartbeat timing:
config system ha
set hb-interval <interval_integer>
set hb-lost-threshold <threshold_integer>
set hello-holddown <holddown_integer>
end
Changing the heartbeat interval
The heartbeat interval is the time between sending HA heartbeat packets. The heartbeat interval range is 1 to 20 (100*ms). The heartbeat interval default is 2 (200 ms).
A heartbeat interval of 2 means the time between heartbeat packets is 200 ms. Changing the heartbeat interval to 5 changes the time between heartbeat packets to 500 ms (5 * 100ms = 500ms).
Use the following CLI command to increase the heartbeat interval to 10:
config system ha
set hb-interval 10
end
Changing the lost heartbeat threshold
The lost heartbeat threshold is the number of consecutive heartbeat packets that a FortiGate-7000F does not receive before assuming that a failure has occurred. The default value of 6 means that if a FortiGate-7000F does not receive 6 heartbeat packets, it determines that the other FortiGate-7000F in the cluster has failed. The range is 1 to 60 packets.
The lower the hb-lost-threshold
, the faster a FortiGate-7000F HA configuration responds when a failure occurs. However, sometimes heartbeat packets may not be received because the other FortiGate-7000F is very busy or because of network conditions. This can lead to a false positive failure detection. To reduce these false positives you can increase the hb-lost-threshold
.
Use the following command to increase the lost heartbeat threshold to 12:
config system ha
set hb-lost-threshold 12
end
Adjusting the heartbeat interval and lost heartbeat threshold
The heartbeat interval combines with the lost heartbeat threshold to set how long a FortiGate-7000F waits before assuming that the other FortiGate-7000F has failed and is no longer sending heartbeat packets. By default, if a FortiGate-7000F does not receive a heartbeat packet from a cluster unit for 6 * 200 = 1200 milliseconds or 1.2 seconds the FortiGate-7000F assumes that the other FortiGate-7000F has failed.
You can increase both the heartbeat interval and the lost heartbeat threshold to reduce false positives. For example, increasing the heartbeat interval to 20 and the lost heartbeat threshold to 30 means a failure will be assumed if no heartbeat packets are received after 30 * 2000 milliseconds = 60,000 milliseconds, or 60 seconds.
Use the following command to increase the heartbeat interval to 20 and the lost heartbeat threshold to 30:
config system ha
set hb-lost-threshold 20
set hb-interval 30
end
Changing the time to wait in the hello state
The hello state hold-down time is the number of seconds that a FortiGate-7000F waits before changing from hello state to work state. After a failure or when starting up, FortiGate-7000Fs in HA mode operate in the hello state to send and receive heartbeat packets to find each other and form a cluster. A FortiGate-7000F should change from the hello state to work state after it finds the FortiGate-7000F to form a cluster with. If for some reason the FortiGate-7000Fs cannot find each other during the hello state both FortiGate-7000Fs may assume that the other one has failed and each could form separate clusters of one FortiGate-7000F. The FortiGate-7000Fs could eventually find each other and negotiate to form a cluster, possibly causing a network interruption as they re-negotiate.
One reason for a delay of the FortiGate-7000Fs finding each other could be the FortiGate-7000Fs are located at different sites or for some other reason communication is delayed between the heartbeat interfaces. If you find that your FortiGate-7000Fs leave the hello state before finding each other you can increase the time that they wait in the hello state. The hello state hold-down time range is 5 to 300 seconds. The hello state hold-down time default is 20 seconds.
Use the following command to increase the time to wait in the hello state to 1 minute (60 seconds):
config system ha
set hello-holddown 60
end