HA heartbeat interface
The HA heartbeat allows cluster units to communicate with each other. The heartbeat consists of hello packets that are sent at regular intervals by the heartbeat interface of all cluster units. The hello packets describe the state of the cluster unit (including communication sessions) and are used by other cluster units to keep the cluster synchronized. While the cluster is operating, the HA heartbeat confirms that all cluster units are functioning normally.
HA heartbeat packets are Layer 2 Ethernet frames that use EtherType values of 0x8890 and 0x8891 rather than 0x0800 for normal 802.3 IP packets. The default time interval between HA heartbeats is 200 ms.
As a best practice, it is recommended to isolate the heartbeat devices from the user networks by connecting the heartbeat devices to a dedicated switch that is not connected to any network. The heartbeat packets contain sensitive information about the cluster configuration and may use a considerable amount of network bandwidth. If the cluster consists of two FortiGates, connect the heartbeat device interfaces back-to back using a crossover cable. If there are more than two FortiGates, each heartbeat interface should be connected to a dedicated switch. For example, in a four-member HA cluster with two heartbeat interfaces, there would be two switches (one switch dedicated to each interface).
Upon starting up, a FortiGate configured for HA broadcasts HA heartbeat hello packets from its HA heartbeat interface to find other FortiGates configured to operate in HA mode. If two or more FortiGates operating in HA mode connect with each other, they compare HA configurations (mode, password, and group ID). If the HA configurations match, then the units negotiate to form a cluster.
The HA heartbeat interface communicates with each unit in the cluster using the same heartbeat interface for each member. For example, if port1 and port2 are the heartbeat interfaces for the HA cluster, then in a cluster consisting of two members:
|
Configuring an HA heartbeat interface
A heartbeat interface is an Ethernet network interface in a cluster that is used by the FGCP for HA heartbeat communications between cluster units.
By default, two interfaces are configured to be heartbeat interfaces on most FortiGate models. The heartbeat interface configuration can be changed to select an additional or different heartbeat interface. It is possible to select only one heartbeat interface; however, this is not a recommended configuration (see Split brain scenario).
Another important setting in the HA configuration is the heartbeat interface priority. In all cases, the heartbeat interface with the highest priority is used for all HA heartbeat communication. If the interface fails or becomes disconnected, then the selected heartbeat interface with the next highest priority handles all HA heartbeat communication.
If more than one heartbeat interface has the same priority, the heartbeat interface with the highest priority that is also highest in the heartbeat interface list is used for all HA heartbeat communication. If this interface fails or becomes disconnected, then the selected heartbeat interface with the highest priority that is next highest in the list handles all heartbeat communication (see Selecting heartbeat packets and interfaces).
The default heartbeat interface configuration sets the priority of both heartbeat interfaces to 50, and the range is 0 to 512. When selecting a new heartbeat interface, the default priority is 0. The higher the number, the higher the priority.
In most cases, the default heartbeat interface configuration can be maintained as long the heartbeat interfaces are connected. Configuring HA heartbeat interfaces is the same for virtual clustering and for standard HA clustering. Up to eight heartbeat interface can be selected. This limit only applies to FortiGates with more than eight physical interfaces.
Heartbeat communications can be enabled on physical interfaces, but not on switch ports, VLAN subinterfaces, IPsec VPN interfaces, redundant interfaces, or 802.3ad aggregate interfaces. |
To change the heartbeat interfaces in the GUI:
-
Go to System > HA and select a Mode.
-
Click the + in the Heartbeat interfaces field to select an interface.
-
Click OK.
To configure two interfaces as heartbeat interfaces with the same priority in the CLI:
config system ha set hbdev port4 150 port5 150 end
In this example, port4 and port5 are configured as the HA heartbeat interfaces and they both have a priority of 150.
To configure two interfaces as heartbeat interfaces with different priorities in the CLI:
config system ha set hbdev port4 100 port1 50 end
In this example, port4 and port1 are configured as the HA heartbeat interfaces. The priority for port4 is higher (100) than port1 (50), so port4 is the preferred HA heartbeat interface.
Split brain scenario
At least one heartbeat interface must be selected for the HA cluster to function correctly. This interface must be connected to all the units in the cluster. If heartbeat communication is interrupted and cannot fail over to a second heartbeat interface, then the cluster units will not be able to communicate with each other and more than one cluster unit may become a primary unit. As a result, the cluster stops functioning normally because multiple devices on the network may be operating as primary units with the same IP and MAC addresses creating a split brain scenario. See Split brain scenario: for more information.
Sharing heartbeat interfaces with traffic ports
HA heartbeat and data traffic is supported on the same cluster interface. In NAT mode, if the heartbeat interfaces are used for processing network traffic, then the interface can be assigned any IP address. The IP address does not affect HA heartbeat traffic.
In transparent mode, the heartbeat interface can be connected to the network with management access enabled on the same interface. A management connection would then be established to the interface using the transparent mode management IP address. This configuration does not affect HA heartbeat traffic.
While these configurations are allowable, they are not recommended. When possible, use dedicated interfaces for heartbeat traffic.
Selecting heartbeat packets and interfaces
HA heartbeat hello packets are sent constantly by all of the enabled heartbeat interfaces. Using these hello packets, each cluster unit confirms that the other cluster units are still operating. The FGCP selects one of the heartbeat interfaces to be used for communication between the cluster units. This interface is used for heartbeat communication and is based on the linkfail states of the heartbeat interfaces, the heartbeat interface priority, and the interface index. The connected heartbeat interface with the highest priority is selected for heartbeat communication.
If more than one connected heartbeat interface has the highest priority, then the FGCP selects the heartbeat interface with the lowest interface index. The interface index order is visible in the CLI by running the diagnose netlink interface list
command.
If the interface that is processing heartbeat traffic fails or becomes disconnected, the FGCP uses the same criteria to select another heartbeat interface for heartbeat communication. If the original heartbeat interface is fixed or reconnected, the FGCP selects this interface again for heartbeat communication.
The HA heartbeat interface communicates cluster session information, synchronizes the cluster configuration, synchronizes the cluster kernel routing table, and reports individual cluster member statuses. The HA heartbeat constantly communicates HA status information to make sure that the cluster is operating properly.
Modifying heartbeat timing
The heartbeat interval and heartbeat lost threshold are two variables that dictate the length of time one cluster unit will wait before determining a peer is dead.
config system ha set hb-interval <integer> set hb-interval-in-milliseconds {100 | 10} set hb-lost-threshold <integer> end
hb-interval <integer> |
Set the time between sending heartbeat packets; increase to reduce false positives (1 - 20, default = 2). |
hb-interval-in-milliseconds {100 | 10} |
Set the number of milliseconds for each heartbeat interval (100 or 10, default = 100). |
hb-lost-threshold <integer> |
Set the number of lost heartbeats to signal a failure; increase to reduce false positives (1 - 60, default = 20). |
Heartbeats are sent out every 2 × 100 ms, and it takes 20 consecutive lost heartbeats for a cluster member to be detected as dead. Therefore, it takes by default 2 × 100 ms × 20 = 4000 ms, or 4 seconds, for a failure to be detected.
Sub-second heartbeat failure detection can be achieved by lowering the interval and threshold or lowering the heartbeat interval unit of measurement from 100 ms to 10 ms.
If the primary unit does not receive a heartbeat packet from a subordinate unit before the heartbeat threshold expires, the primary unit assumes that the subordinate unit has failed.
If a subordinate unit does not receive a heartbeat packet from the primary unit before the heartbeat threshold expires, the subordinate unit assumes that the primary unit has failed. The subordinate unit then begins negotiating to become the new primary unit.
The HA heartbeat packets consume more bandwidth if the heartbeat interval is short. But if the heartbeat interval is very long, the cluster is not as sensitive to topology and other network changes. Therefore, gauge your settings based on the amount of traffic and CPU usage sustainable by the cluster units versus the tolerance for an outage when the primary unit fails. Avoid using the heartbeat interfaces as traffic ports to prevent congesting the interfaces.
Changing the time to wait in the hello state
The hello state hold down time is the number of seconds that a cluster unit waits before changing from hello state to work state. After a failure or when starting up, cluster units operate in the hello state to send and receive heartbeat packets so that all the cluster units can find each other and form a cluster. A cluster unit should change from the hello state to work state after it finds all the other FortiGates to form a cluster with.
If all cluster units cannot find each other during the hello state, then some cluster units may join the cluster after it has formed. This can cause disruptions to the cluster and affect how it operates. A delay could occur if the cluster units are located at different sites or if communication is delayed between the heartbeat interfaces. If delays occur, increase the cluster units wait time in the hello state.
config system ha set hello-holddown <integer> end
hello-holddown <integer> |
Set the time to wait before changing from hello to work state, in seconds (5 - 300, default = 20). |
Configuring HA heartbeat encryption and authentication
HA heartbeat encryption and authentication to encrypt and authenticate HA heartbeat packets can be enabled. HA heartbeat packets should be encrypted and authenticated if the cluster interfaces that send HA heartbeat packets are also connected to the networks. HA heartbeat encryption and authentication are disabled by default. Note that enabling these settings could reduce cluster performance.
config system ha set authentication {enable | disable} set encryption {enable | disable} end
If HA heartbeat packets are not encrypted, the cluster password and changes to the cluster configuration could be exposed. An attacker may be able to sniff HA packets to get cluster information. Enabling HA heartbeat message authentication prevents an attacker from creating false HA heartbeat messages. False HA heartbeat messages could affect the stability of the cluster.
HA authentication and encryption uses AES-128 for encryption and SHA1 for authentication. Heartbeat messages are encrypted and encapsulated in ESP packets for transfer in an IPsec tunnel between the cluster members.
Heartbeat bandwidth requirements
The majority of the traffic processed by the HA heartbeat interface is session synchronization traffic. Other heartbeat interface traffic required to synchronize IPsec states, IPsec keys, routing tables, configuration changes, and so on is usually negligible.
The amount of traffic required for session synchronization depends on the connections per second (CPS) that the cluster is processing, since only new sessions (and session table updates) need to be synchronized.
Another factor to consider is that if session pickup is enabled, the traffic on the heartbeat interface surges during a failover or when a unit joins or re-joins the cluster. When one of these events occurs, the entire session table needs to be synchronized. Lower throughput HA heartbeat interfaces may increase failover time if they cannot handle the higher demand during these events.
The amount of heartbeat traffic can also be reduced by:
- Turning off session pickup if it is not needed
- Enabling
session-pickup-delay
to reduce the number of sessions that are synchronized - Using the
session-sync-dev
option to move session synchronization traffic off of the heartbeat link
Heartbeat packet EtherTypes
Normal 802.3 IP packets have an EtherType field value of 0x0800. EtherType values other than 0x0800 are understood as Layer 2 frames rather than IP packets.
HA heartbeat packets use the following EtherTypes:
Field value |
Function |
Description |
---|---|---|
0x8890 |
Heartbeat |
Heartbeat packets are used by cluster units to find other cluster units, and to verify the status of other cluster units while the cluster is operating. Use the |
0x8891 |
Traffic redistribution from primary to subordinate |
These are used when the HA primary needs to redistribute traffic packets and the corresponding session information to the subordinate units in A-A mode. Use the |
0x8892 |
Session synchronization |
Session synchronization uses the heartbeat interfaces for communication, unless session synchronization devices are specified. See Session synchronization for more information. |
0x8893 |
HA Telnet sessions (configuration synchronization) |
The Telnet sessions are used to synchronize the cluster configurations, and to connect from one cluster unit's CLI to another when an administrator uses the Use the |
Session synchronization
Since large amounts of session synchronization traffic can increase network congestion, it is recommended to keep this traffic off of the network and separate from the HA heartbeat interfaces by using dedicated connections for it. The interfaces are configured in the session-sync-dev
setting.
The session synchronization device interfaces must be connected together by directly using the appropriate cable or using switches. If one of the interfaces becomes disconnected, then the cluster uses the remaining interfaces for session synchronization. If all the session synchronization interfaces become disconnected, then session synchronization reverts to using the HA heartbeat link.
All session synchronization traffic is between the primary unit and each subordinate unit. Session synchronization always uses UDP/708, but this will be encapsulated differently depending on the session-sync-dev
setting. If session-sync-dev
is specified, the packets will use 0x8892 and will exit over the mentioned port. If session-sync-dev
is not specified, the packets will use 0x8893 and will exit the heartbeat port.
Session synchronization packets are typically processed by a single CPU core because all source and destination MAC addresses of the L2 frames are the same. Hashing based on the L2 addresses maps the processing of the frames to the same core. When large amounts of session synchronization traffic must be processed, enable the sync-packet-balance
setting to distribute the processing to more cores. This effectively uses a larger set of MAC addresses for the hashing to map to multiple cores.
Troubleshooting heartbeat packets
Understanding the different types of heartbeat packets will ease troubleshooting. Heartbeat packets are recognized as Layer 2 frames. The switches and routers on the heartbeat network that connect to heartbeat interfaces must be configured to allow them to pass through. If Layer 2 frames are dropped by these network devices, then the heartbeat traffic will not be allowed between the cluster units.
For example, some third-party network equipment may not allow EtherType 0x8893. The unit can still be found in the HA cluster, but you would be unable to run execute ha manage
to manage the other unit. Use the following settings to change the EtherTypes of the HA heartbeat packets, if they require changing them for the traffic to be forwarded on the connected switch.
config system ha set ha-eth-type <hex_value> set hc-eth-type <hex_value> set l2ep-eth-type <hex_value> end
To change the EtherType values of the heartbeat and HA Telnet session packets:
config system ha set ha-eth-type 8895 set l2ep-eth-type 889f end
For troubleshooting issues with packets sent or received on the HA heartbeat ports, use the following diagnostic command to sniff the traffic by EtherType.
# diagnose sniffer packet any 'ether proto <EtherType_in_hex>' 6 0 1
To sniff the traffic on EtherType 0x8890:
# diagnose sniffer packet any 'ether proto 0x8890' 6 0 l Using Original Sniffing Mode interfaces=[any] filters=[ether proto 0x8890] 2022-10-19 16:22:26.512813 port5 out Ether type 0x8890 printer hasn't been added to sniffer. 0x0000 0000 0000 0000 000c 293b e61c 8890 5201 ........);....R. 0x0010 020c 6e65 7700 0000 0000 0000 0000 0000 ..new........... 0x0020 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0030 0000 0000 0700 0000 0000 0000 0000 8738 ...............8 0x0040 0100 706f 7274 3500 0000 0000 0000 0000 ..port5......... 0x0050 0000 0300 843d 4647 564d 3034 544d 3232 .....=FGVM04TM22 0x0060 3030 3236 3338 0b00 0100 000c 0001 00c8 002001.......... 0x0070 0d00 0100 000e 0004 0009 0000 000f 0004 ................ 0x0080 0000 0000 0010 0004 0000 0000 0011 0004 ................ 0x0090 0000 0000 0012 0004 0001 0000 0028 0000 .............(.. 0x00a0 002b 0002 000a 002c 0002 000a 0038 0008 .+.....,.....8.. 0x00b0 00c0 0300 0000 0000 0037 0004 0000 0000 .........7...... 0x00c0 003c 0030 0030 2704 175f 0858 9d4f 5611 .<.0.0'.._.X.OV. 0x00d0 2005 6310 b1b0 be14 e029 1f5b 61fd 5b49 ..c......).[a.[I 0x00e0 7cad bed4 ecaf 05bd 70c3 2adc 4fa0 6ab7 |.......p.*.O.j. 0x00f0 4d5d 1df7 4f3d 000c 0007 0000 0002 0000 M]..O=.......... 0x0100 0085 0400 003e 0001 0000 4000 0400 0000 .....>....@..... 0x0110 0000 3f00 2400 0000 0000 0000 0000 0000 ..?.$........... 0x0120 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0130 0000 0000 0000 0000 0000 3300 0400 0000 ..........3..... 0x0140 0000 2a00 7200 0a00 789c edcc 290e c250 ..*.r...x...)..P 0x0150 1440 d19f d420 5068 3449 5dcb d009 8b66 .@....Ph4I]....f 0x0160 2b34 8435 b302 3401 9e22 6f05 15e7 c82b +4.5..4.."o....+ 0x0170 ee7c bb3f daf2 675d 9f9f af6a fee6 7dce .|.?..g]...j..}. 0x0180 efc8 879c 5791 8f39 6f22 9f72 de46 ee72 ....W..9o".r.F.r 0x0190 de45 ee73 6eca 2f0f 394f 91c7 9c2f 3169 .E.sn./.9O.../1i 0x01a0 9b94 af55 0100 0000 0000 0000 0000 0000 ...U............ 0x01b0 0058 ac0f 0096 24af 0000 0000 .X....$..... 2022-10-19 16:22:26.545236 port5 in Ether type 0x8890 printer hasn't been added to sniffer. 0x0000 ffff ffff ffff 000c 29ca ba5d 8890 5201 ........)..]..R. 0x0010 020c 6e65 7700 0000 0000 0000 0000 0000 ..new........... 0x0020 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0030 0000 0000 0700 0000 0000 0000 0000 8738 ...............8 0x0040 0100 706f 7274 3500 0000 0000 0000 0000 ..port5......... 0x0050 0000 0300 d221 4647 564d 3034 544d 3232 .....!FGVM04TM22 0x0060 3030 3236 3339 0b00 0100 000c 0001 0080 002002.......... 0x0070 0d00 0100 000e 0004 0000 0000 000f 0004 ................ 0x0080 0000 0000 0010 0004 0000 0000 0011 0004 ................ 0x0090 0000 0000 0012 0004 0000 0000 0028 0000 .............(.. 0x00a0 002b 0002 000a 002c 0002 000a 0038 0008 .+.....,.....8.. 0x00b0 00e6 0400 0000 0000 0037 0004 0000 0000 .........7...... 0x00c0 003c 0030 0029 6d7e 3407 2d31 c00f 42b3 .<.0.)m~4.-1..B. 0x00d0 59b6 17cb 4be7 d043 a158 e74c 5841 c821 Y...K..C.X.LXA.! 0x00e0 7843 b598 c95d 3dcf 81a9 bc8b b304 53f3 xC...]=.......S. 0x00f0 17b6 3cd5 a83d 000c 0007 0000 0002 0000 ..<..=.......... 0x0100 0085 0400 0040 0004 0000 0000 003f 0024 .....@.......?.$ 0x0110 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0120 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x0130 0000 0000 0033 0004 0000 0000 002a 0073 .....3.......*.s 0x0140 000a 0078 9ced cc21 1282 5014 40d1 3f43 ...x...!..P.@.?C 0x0150 7523 3651 414c 66b2 994c 1419 9bd9 ec7e u#6QALf..L.....~ 0x0160 5c82 ab52 5e72 de0a 0ce7 c41b ee74 996f \..R^r.......t.o 0x0170 75f9 b15a bf5f 4d35 7df3 36e7 53e4 5dce u..Z._M5}.6.S.]. 0x0180 7de4 7dce e7c8 4dce 43e4 36e7 31f2 21e7 }.}...M.C.6.1.!. 0x0190 6b59 7297 f33d f231 e747 4cea 4dca cfaa kYr..=.1.GL.M... 0x01a0 0000 0000 0000 0000 0000 0000 00fc ad0f ................ 0x01b0 c16c 2917 0000 0000 .l).....
Interface IP addresses
An FGCP cluster communicates heartbeat packets using Layer 2 frames over the physical heartbeat interface, but it also communicates other synchronization traffic, logs, and locally generated traffic from subordinate devices over Layer 3 IP packets. Additional virtual interfaces are created in the hidden vsys_ha VDOM, which need to be addressed with IPv4 addresses.
The FGCP uses link-local IPv4 addresses (see RFC 3927) in the 169.254.0.x range for the virtual HA heartbeat interface (port_ha) and for the inter-VDOM link interfaces between the vsys_ha and management VDOM. When members join an HA cluster, each member's heartbeat interface (port_ha) is assigned an IP address from the range of 169.254.0.1 to 169.254.0.63/26. HA inter-VDOM link interfaces (havdlink0 and havdlink1) are assigned IP address from the range of 169.254.0.65 to 169.254.0.66/26.
The IP address that is assigned to a virtual heartbeat interface depends on the serial number priority of the member. Higher serial numbers have a higher priority, and therefore a lower serialno_prio
number, for example:
# diagnose sys ha status ... FGVM08TM20002002: Secondary, serialno_prio=0, usr_priority=128, hostname=FGVM08TM20002002 FGVM08TM19003001: Primary, serialno_prio=1, usr_priority=128, hostname=FGVM08TM19003001
The member with serialno_prio=0
is assigned IP address 169.254.0.1, serialno_prio=1
is assigned 169.254.0.2, and so forth.
To view the HA heartbeat interface IP address of the primary unit:
# get system ha status ... vcluster 1: work 169.254.0.2 ...
To view all the assigned IP addresses of a device:
# diagnose ip address list IP=172.16.151.84->172.16.151.84/255.255.255.0 index=3 devname=port1 IP=192.168.2.204->192.168.2.204/255.255.255.0 index=6 devname=port2 IP=10.10.10.1->10.10.10.1/255.255.255.0 index=9 devname=port3 IP=127.0.0.1->127.0.0.1/255.0.0.0 index=13 devname=root IP=127.0.0.1->127.0.0.1/255.0.0.0 index=16 devname=vsys_ha IP=169.254.0.2->169.254.0.2/255.255.255.192 index=17 devname=port_ha IP=127.0.0.1->127.0.0.1/255.0.0.0 index=18 devname=vsys_fgfm IP=169.254.0.65->169.254.0.65/255.255.255.192 index=19 devname=havdlink0 IP=169.254.0.66->169.254.0.66/255.255.255.192 index=20 devname=havdlink1
When generating traffic from a subordinate unit, traffic will be routed to the primary unit’s port_ha virtual heartbeat interface. From there, if traffic is destined to another network, the traffic is routed from the vsys_ha VDOM to the management VDOM by the havdlink interfaces.
Use the execute traceroute
command on the subordinate unit to display HA heartbeat IP addresses and the HA inter-VDOM link IP addresses.
To trace the route to an IP address on a subordinate unit:
# execute ha manage 1 # execute traceroute 172.20.20.10 traceroute to 172.20.20.10 (172.20.20.10), 32 hops max, 72 byte packets 1 169.254.0.1 0 ms 0 ms 0 ms 2 169.254.0.66 0 ms 0 ms 0 ms 3 172.20.20.10 0 ms 0 ms 0 ms
To run a sniffer trace on the primary unit to view the traffic flow:
# diagnose sniffer packet any 'net 169.254.0.0/24' 4 0 l