Modifying heartbeat timing
If the FortiGate-6000s in the HA cluster do not receive heartbeat packets on time, the FortiGate-6000s in the HA configuration may each determine that the other FortiGate-6000 has failed. HA heartbeat packets may not be sent on time because of network issues. For example, if the HA1 and HA2 communications links between the FortiGate-6000s become too busy to handle the heartbeat traffic. Also, in a distributed clustering configuration the round trip time (RTT) between the FortiGate-6000s may be longer the expected time between heartbeat packets.
In addition, if the FortiGate-6000s becomes excessively busy, they may delay sending heartbeat packets.
Even with these delays, the FortiGate-6000 HA cluster can continue to function normally as long as the HA heartbeat configuration supports longer delays between heartbeat packets and more missed heartbeat packets.
You can use the following commands to configure heartbeat timing:
config system ha
set hb-interval <interval_integer>
set hb-lost-threshold <threshold_integer>
set hello-holddown <holddown_integer>
end
Changing the heartbeat interval
The heartbeat interval is the time between sending HA heartbeat packets. The heartbeat interval range is 1 to 20 (100*ms). The heartbeat interval default is 2 (200 ms).
A heartbeat interval of 2 means the time between heartbeat packets is 200 ms. Changing the heartbeat interval to 5 changes the time between heartbeat packets to 500 ms (5 * 100ms = 500ms).
Use the following CLI command to increase the heartbeat interval to 10:
config system ha
set hb-interval 10
end
Changing the lost heartbeat threshold
The lost heartbeat threshold is the number of consecutive heartbeat packets that a FortiGate-6000 does not receive before assuming that a failure has occurred. The default value of 6 means that if a FortiGate-6000 does not receive 6 heartbeat packets, it determines that the other FortiGate-6000 in the cluster has failed. The range is 1 to 60 packets.
The lower the hb-lost-threshold
, the faster a FortiGate-6000 HA configuration responds when a failure occurs. However, sometimes heartbeat packets may not be received because the other FortiGate-6000 is very busy or because of network conditions. This can lead to a false positive failure detection. To reduce these false positives you can increase the hb-lost-threshold
.
Use the following command to increase the lost heartbeat threshold to 12:
config system ha
set hb-lost-threshold 12
end
Adjusting the heartbeat interval and lost heartbeat threshold
The heartbeat interval combines with the lost heartbeat threshold to set how long a FortiGate-6000 waits before assuming that the other FortiGate-6000 has failed and is no longer sending heartbeat packets. By default, if a FortiGate-6000 does not receive a heartbeat packet from a cluster unit for 6 * 200 = 1200 milliseconds or 1.2 seconds the FortiGate-6000 assumes that the other FortiGate-6000 has failed.
You can increase both the heartbeat interval and the lost heartbeat threshold to reduce false positives. For example, increasing the heartbeat interval to 20 and the lost heartbeat threshold to 30 means a failure will be assumed if no heartbeat packets are received after 30 * 2000 milliseconds = 60,000 milliseconds, or 60 seconds.
Use the following command to increase the heartbeat interval to 20 and the lost heartbeat threshold to 30:
config system ha
set hb-lost-threshold 20
set hb-interval 30
end
Changing the time to wait in the hello state
The hello state hold-down time is the number of seconds that a FortiGate-6000 waits before changing from the hello state to the work state. After a failure or when starting up, FortiGate-6000s in HA mode operate in the hello state to send and receive heartbeat packets, to find each other, and form a cluster. A FortiGate-6000 should change from the hello state to the work state after it finds the FortiGate-6000 to form a cluster with. If for some reason the FortiGate-6000s cannot find each other during the hello state, both FortiGate-6000s may assume that the other one has failed and each could form separate clusters of one FortiGate-6000. The FortiGate-6000s could eventually find each other and negotiate to form a cluster, possibly causing a network interruption as they re-negotiate.
One reason for a delay of the FortiGate-6000s finding each other could be the FortiGate-6000s are located at different sites or for some other reason communication is delayed between the heartbeat interfaces. If you find that your FortiGate-6000s leave the hello state before finding each other you can increase the time that they wait in the hello state. The hello state hold-down time range is 5 to 300 seconds. The hello state hold-down time default is 20 seconds.
Use the following command to increase the time to wait in the hello state to 1 minute (60 seconds):
config system ha
set hello-holddown 60
end