Modifying heartbeat timing
If the FortiGate-7000Es in the HA cluster do not receive heartbeat packets on time, the FortiGate-7000Es in the HA configuration may each determine that the other FortiGate-7000E has failed. HA heartbeat packets may not be sent on time because of network issues. For example, if the M1 and M2 communications links between the FortiGate-7000Es become too busy to handle the heartbeat traffic. Also, in a distributed clustering configuration the round trip time (RTT) between the FortiGate-7000Es may be longer the expected time between heartbeat packets.
In addition, if the FortiGate-7000Es becomes excessively busy, they may delay sending heartbeat packets.
Even with these delays, the FortiGate-7000E HA cluster can continue to function normally as long as the HA heartbeat configuration supports longer delays between heartbeat packets and more missed heartbeat packets.
You can use the following commands to configure heartbeat timing:
config system ha
set hb-interval <interval_integer>
set hb-lost-threshold <threshold_integer>
set hello-holddown <holddown_integer>
end
Changing the heartbeat interval
The heartbeat interval is the time between sending HA heartbeat packets. The heartbeat interval range is 1 to 20 (100*ms). The heartbeat interval default is 2 (200 ms).
A heartbeat interval of 2 means the time between heartbeat packets is 200 ms. Changing the heartbeat interval to 5 changes the time between heartbeat packets to 500 ms (5 * 100ms = 500ms).
Use the following CLI command to increase the heartbeat interval to 10:
config system ha
set hb-interval 10
end
Changing the lost heartbeat threshold
The lost heartbeat threshold is the number of consecutive heartbeat packets that a FortiGate does not receive before assuming that a failure has occurred. The default value of 6mean that if a FortiGate-7000E does not receive 6 heartbeat packets it determines that the other FortiGate-7000E in the cluster has failed. The range is 1 to 60 packets.
The lower the hb-lost-threshold
, the faster a FortiGate-7000E HA configuration responds when a failure occurs. However, sometimes heartbeat packets may not be received because the other FortiGate-7000E is very busy or because of network conditions. This can lead to a false positive failure detection. To reduce these false positives you can increase the hb-lost-threshold
.
Use the following command to increase the lost heartbeat threshold to 12:
config system ha
set hb-lost-threshold 12
end
Adjusting the heartbeat interval and lost heartbeat threshold
The heartbeat interval combines with the lost heartbeat threshold to set how long a FortiGate-7000E waits before assuming that the other FortiGate-7000E has failed and is no longer sending heartbeat packets. By default, if a FortiGate-7000E does not receive a heartbeat packet from a cluster unit for 6 * 200 = 1200 milliseconds or 1.2 seconds the FortiGate-7000E assumes that the other FortiGate-7000E has failed.
You can increase both the heartbeat interval and the lost heartbeat threshold to reduce false positives. For example, increasing the heartbeat interval to 20 and the lost heartbeat threshold to 30 means a failure will be assumed if no heartbeat packets are received after 30 * 2000 milliseconds = 60,000 milliseconds, or 60 seconds.
Use the following command to increase the heartbeat interval to 20 and the lost heartbeat threshold to 30:
config system ha
set hb-lost-threshold 20
set hb-interval 30
end
Changing the time to wait in the hello state
The hello state hold-down time is the number of seconds that a FortiGate-7000E waits before changing from hello state to work state. After a failure or when starting up, FortiGate-7000Es in HA mode operate in the hello state to send and receive heartbeat packets to find each other and form a cluster. A FortiGate-7000E should change from the hello state to work state after it finds the FortiGate-7000E to form a cluster with. If for some reason the FortiGate-7000Es cannot find each other during the hello state both FortiGate-7000Es may assume that the other one has failed and each could form separate clusters of one FortiGate-7000E. The FortiGate-7000Es could eventually find each other and negotiate to form a cluster, possibly causing a network interruption as they re-negotiate.
One reason for a delay of the FortiGate-7000Es finding each other could be the FortiGate-7000Es are located at different sites or for some other reason communication is delayed between the heartbeat interfaces. If you find that your FortiGate-7000Es leave the hello state before finding each other you can increase the time that they wait in the hello state. The hello state hold-down time range is 5 to 300 seconds. The hello state hold-down time default is 20 seconds.
Use the following command to increase the time to wait in the hello state to 1 minute (60 seconds):
config system ha
set hello-holddown 60
end