Remote link failover
Remote link failover uses link health monitors on the primary FortiGate to test connectivity with IP addresses of remote network devices, for example, a downstream router. Remote link failover causes a failover if one or more of these remote IP addresses does not respond to link health checking.
By being able to detect failures in network equipment not directly connected to the cluster, remote link failover can be useful in a number of ways depending on your network configuration. For example, in a full mesh HA configuration, with remote IP monitoring, the cluster can detect failures in network equipment that is not directly connected to the cluster but that would interrupt traffic processed by the cluster if the equipment failed.
Example HA remote IP monitoring topology
In the simplified example topology shown above, the switch connected directly to the primary unit is operating normally but the link on the other side of the switch fails. After the failure, traffic can no longer flow between the primary unit and the internet.
To detect this failure you can enable remote link failover and create a link health monitor for port2 that causes the primary unit to test connectivity to 192.168.20.20. If the link health monitor can't connect to 192.268.20.20, the cluster to fails over and the subordinate unit becomes the new primary unit. After the failover, the health check monitor on the new primary unit can connect to 192.168.20.20, so the failover maintains connectivity between the internal network and the internet through the cluster.
Remote link failover is active only on the primary unit and only the primary unit can detect a remote link failure. If the primary unit detects a remote link failure and causes a failover, the new primary unit may also detect this failure and cause another failover.
To reduce the potential number of failovers, remote IP monitoring includes a flip timer, set to a relatively high default value of 60 minutes. The flip timeout stops HA remote link failover from causing a failover until the primary unit has been operating for the duration of the flip timeout.
If you set the flip timeout to a relatively high number of minutes, you can find and repair the network problem that prevented the cluster from connecting to the remote IP address without the cluster experiencing very many failovers. Even if it takes a while to detect the problem, repeated failovers at relatively long time intervals do not usually disrupt network traffic.
Example remote link failover configuration
In most cases you should accept the default remote link failover configuration. The default configuration consists of:
- Enabling HA remote link failover for one or more FortiGate interfaces.
- Enabling link monitoring for those interfaces.
For example, the following configuration enables HA remote IP monitoring for the port2 interface:
config system ha
set pingserver-monitor-interface port2
set pingserver-failover-threshold 0
set pingserver-slave-force-reset enable
set pingserver-flip-timeout 60
end
The pingserver-failover-threshold
, pingserver-slave-force-reset
, and pingserver-flip-timeout
options remain set to their default values.
After enabling HA remote link failover, you must configure a link monitor for the interface. The link monitor also includes the remote IP address to monitor. All other options, including the ha-priority
remain set to defaults:
config system link-monitor
edit ha-link-monitor
set server 192.168.20.20
set srcintf port2
set ha-priority 1
set interval 1
set failtime 5
end
This configuration causes the primary unit to check the 192.168.20.20 IP address from the port2 interface and to cause a failover if the link monitor doesn't get a response from this IP address after 5 failed attempts. After a failover occurs, HA remote link failover can't cause another failover for at least 60 minutes. After 60 minutes, the cluster uses the normal primary unit selection process to select a primary unit. After the new primary unit is selected, link monitoring resumes operating as before.
You can adjust this configuration in following ways:
- Enabling remote link failover for more interfaces by adding more interfaces to the
pingserver-monitor-interface
. You must also add a link monitor for each interface. - If you have enabled override, you can disable
pingserver-slave-force-reset
to reduce the number of failovers. If override is enabled and a remote link failover has occurred, after the flip timeout, even if the current primary unit is not experiencing a remote link failure, ifpingserver-slave-force-reset
is enabled, override causes the cluster to negotiate and select the FortiGate with the highest priority to become the primary unit. Then, if the remote link has not been restored for the FortiGate with the highest priority, remote link failover may cause another failover. But with override enabled, ifpingsever-slave-force-reset
is disabled, as long as the current primary unit is not experiencing a remote link failure, the cluster will not renegotiate. In brief, disablingpingserver-slave-force-reset
prevents repeated failovers if the remote link is not restored for both FortiGates when the current primary unit experiences a remote link failure. - Increase the
interval
orfailtime
to reduce how often a remote link failure is detected.
Changing the link monitor failover threshold
If you have multiple link monitors, you may want a failover to occur only if more than one of them fails.
For example, you may have three link monitors configured on three interfaces but only want a failover to occur if two of the link monitors fail. To do this you must set the HA priorities of the link monitors and the HA pingserver-failover-threshold
so that the priority of one link monitor is less than the failover threshold but the added priorities of two link monitors is equal to or greater than the failover threshold. Failover occurs when the HA priority of all failed link monitors reaches or exceeds the threshold.
For example, set the failover threshold to 10 and monitor three interfaces:
config system ha
set pingserver-monitor-interface port2 port20 vlan_234
set pingserver-failover-threshold 10
set pingserver-flip-timeout 120
end
Then set the HA priority of link monitor server to 5.
The HA Priority (ha-priority ) setting is not synchronized among cluster units. In the following example, you must set the HA priority to 5 by logging into each cluster unit unless you only want this configuration to be active on one of the units in the cluster. |
config system link-monitor
edit port2
set srcintf port2
set server 192.168.20.20
set ha-priority 5
next
edit port20
set srcintf port20
set server 192.168.20.30
set ha-priority 5
next
edit vlan_234
set srcintf vlan_234
set server 172.20.12.10
set ha-priority 5
end
If only one of the link monitors fails, the total link monitor HA priority will be 5, which is lower than the failover threshold so a failover will not occur. If a second link monitor fails, the total link monitor HA priority of 10 will equal the failover threshold, causing a failover.
By adding multiple link monitors and setting the HA priorities for each, you can fine tune remote IP monitoring. For example, if it is more important to maintain connections to some networks you can set the HA priorities higher for these link monitors. And if it is less important to maintain connections to other networks you can set the HA priorities lower for these link monitors. You can also adjust the failover threshold so that if the cluster cannot connect to one or two high priority IP addresses a failover occurs. But a failover will not occur if the cluster cannot connect to one or two low priority IP addresses.
Detecting HA remote IP monitoring failovers
Just as with any HA failover, you can detect HA remote link faiolver events using SNMP to monitor for HA traps. You can also use alert email to receive notifications of HA status changes and monitor log messages for HA failover log messages. In addition, the critical log message Ping Server is down
is generated when a ping server fails. The log message includes the name of the interface that the ping server that detected the failure.