HA primary unit selection criteria

In a FGCP HA setup, cluster members must negotiate to determine who will become the primary unit upon connecting to the HA cluster. Once a primary unit is identified, all other members become subordinate (or secondary) members.

When does primary unit selection occur?

Primary unit selection occurs whenever a new unit joins the HA cluster or the primary unit leaves the HA cluster. It also occurs whenever a monitored interface status changes.

This can occur when:

Two or more units initially form a new HA cluster.
A new unit joins an existing HA cluster.
A device failover takes place, where the primary unit fails due to a device failure.
A link failover takes place, where a monitored interface on any unit either fails or is restored.

Relevant configurations

Configurations that can impact the HA primary unit section are listed below:

Priority

A value between 0-255 assigned to this unit. A higher number indicates higher priority. By default, priority is 128.

Priority value does not get synchronized to other HA members.

Monitor

Interface(s) to check for a physical link failure.

Override

Enable to prioritize priority value over uptime in HA primary unit selection. Disable to prioritize uptime over priority value.

This setting is disabled by default.

From CLI:

config system ha
    set priority <integer>
    set monitor <interface list>
    set override {enable | disable}
end

From GUI:

On the System > HA page:

Primary unit selection criteria

If the HA override setting is disabled on all cluster members, the primary unit will be selected based on the following order:

If the HA override setting is enabled on all cluster members, the primary unit will be selected based on the following order:

For each criteria, if the value is the same, then it is considered a tie, and the next criteria is evaluated.

For the HA uptime criteria:

If the difference between HA uptime is more than five (5) minutes (300 seconds), the cluster unit that is operating longer becomes the primary unit.
If the difference between HA uptime is less than five (5) minutes (300 seconds), then the criteria is considered a tie.
If a monitored interface fails on a HA unit, its HA uptime is reset to zero (0).
If a cluster member restarts, the HA uptime is reset to zero (0).

In some documents, the terms MUPS and MPUS, which are based on the first letters of each criteria, are used to describe the order in which the criteria are considered during the HA primary unit selection process.

Viewing the role of the unit

After HA primary unit selection has completed, you can view the HA role of each unit in various ways.

In the GUI, go to System > HA to view the members in the cluster and the role for each member.

From the CLI, run get system ha status. The role of each unit is displayed:

# get system ha status … Primary: FG101FTK19xxxxx7, HA operating index = 0 Secondary: FG101FTK19xxxxx8, HA operating index = 1

Similarly, from the CLI, run diagnose sys ha status. The role of each unit is displayed.

Viewing how the primary unit was selected

You can use the get system ha status command to see how the primary unit was selected. The output of this command contains a section called Primary selected using that shows a history of how the primary unit was selected.

# get system ha status
HA Health Status:
    WARNING: FG101FTK19xxxxx7 has hbdev down;
    WARNING: FG101FTK19xxxxx8 has hbdev down;
Model: FortiGate-101F
Mode: HA A-A
Group Name: FGT_HA
Group ID: 0
Debug: 0
Cluster Uptime: 5 days 8h:30m:57s
Cluster state change time: 2024-04-12 02:25:05
Primary selected using:
    <2024/04/12 02:25:05> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because its override priority is larger than peer member FG101FTK19xxxxx8.
    <2024/04/12 02:25:04> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because it's the only member in the cluster.
    <2024/04/12 02:13:34> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because its override priority is larger than peer member FG101FTK19xxxxx8.
    <2024/04/12 02:09:28> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because it's the only member in the cluster.

Comparing the HA uptime between cluster members

You can use the CLI command diagnose sys ha dump-by group to display the age difference of the units in a cluster. This command also displays information about a number of HA related parameters for each cluster unit.

For example, consider a cluster of two FortiGate units. Entering the diagnose sys ha dump-by group command from the primary unit CLI displays information similar to the following:

# diagnose sys ha dump-by group 
...
vcluster_nr=1
vcluster-1: start_time=1712913904(2024-04-12 02:25:04), state/o/chg_time=2(work)/2(work)/1712913904(2024-04-12 02:25:04)
        pingsvr_flip_timeout/expire=3600s/0s
        'FG101FTK19xxxxx8': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=0/2
        'FG101FTK19xxxxx7': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=189/2

The last two lines of the output display status information about each cluster unit including the uptime. The uptime is the age difference in seconds between the two units in the cluster.

In the example, the age of the subordinate unit is 189 seconds more than the age of the primary unit. The age difference is less than five (5) minutes (less than 300 seconds), so age has no effect on primary unit selection.

Changing the cluster age difference margin

You can change the cluster age difference margin using the following command:

config system ha
    set ha-uptime-diff-margin <margin> 
end

Where the <margin> can be from 1 to 65535 seconds (default = 300).

Resetting the uptime of a unit

For debugging purpose, you may want to reset the HA member’s uptime without restarting the unit or changing the status of a monitored interface.

To manually change the uptime:

# diagnose sys ha reset-uptime

The command resets the HA age internally and does not affect the up time displayed for cluster units using the diagnose sys ha dump-by all-vcluster or diagnose sys ha dump-by all-vcluster command. It also does not affect the time displayed on the Dashboard or cluster members list.

HA primary unit selection criteria

When does primary unit selection occur?

Primary unit selection occurs whenever a new unit joins the HA cluster or the primary unit leaves the HA cluster. It also occurs whenever a monitored interface status changes.

This can occur when:

Two or more units initially form a new HA cluster.
A new unit joins an existing HA cluster.
A device failover takes place, where the primary unit fails due to a device failure.
A link failover takes place, where a monitored interface on any unit either fails or is restored.

Relevant configurations

Configurations that can impact the HA primary unit section are listed below:

Priority

A value between 0-255 assigned to this unit. A higher number indicates higher priority. By default, priority is 128.

Priority value does not get synchronized to other HA members.

Monitor

Interface(s) to check for a physical link failure.

Override

Enable to prioritize priority value over uptime in HA primary unit selection. Disable to prioritize uptime over priority value.

This setting is disabled by default.

From CLI:

config system ha
    set priority <integer>
    set monitor <interface list>
    set override {enable | disable}
end

From GUI:

On the System > HA page:

Primary unit selection criteria

If the HA override setting is disabled on all cluster members, the primary unit will be selected based on the following order:

If the HA override setting is enabled on all cluster members, the primary unit will be selected based on the following order:

For each criteria, if the value is the same, then it is considered a tie, and the next criteria is evaluated.

For the HA uptime criteria:

If the difference between HA uptime is more than five (5) minutes (300 seconds), the cluster unit that is operating longer becomes the primary unit.
If the difference between HA uptime is less than five (5) minutes (300 seconds), then the criteria is considered a tie.
If a monitored interface fails on a HA unit, its HA uptime is reset to zero (0).
If a cluster member restarts, the HA uptime is reset to zero (0).

Viewing the role of the unit

After HA primary unit selection has completed, you can view the HA role of each unit in various ways.

In the GUI, go to System > HA to view the members in the cluster and the role for each member.

From the CLI, run get system ha status. The role of each unit is displayed:

# get system ha status … Primary: FG101FTK19xxxxx7, HA operating index = 0 Secondary: FG101FTK19xxxxx8, HA operating index = 1

Similarly, from the CLI, run diagnose sys ha status. The role of each unit is displayed.

Viewing how the primary unit was selected

# get system ha status
HA Health Status:
    WARNING: FG101FTK19xxxxx7 has hbdev down;
    WARNING: FG101FTK19xxxxx8 has hbdev down;
Model: FortiGate-101F
Mode: HA A-A
Group Name: FGT_HA
Group ID: 0
Debug: 0
Cluster Uptime: 5 days 8h:30m:57s
Cluster state change time: 2024-04-12 02:25:05
Primary selected using:
    <2024/04/12 02:25:05> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because its override priority is larger than peer member FG101FTK19xxxxx8.
    <2024/04/12 02:25:04> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because it's the only member in the cluster.
    <2024/04/12 02:13:34> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because its override priority is larger than peer member FG101FTK19xxxxx8.
    <2024/04/12 02:09:28> vcluster-1: FG101FTK19xxxxx7 is selected as the primary because it's the only member in the cluster.

Comparing the HA uptime between cluster members

For example, consider a cluster of two FortiGate units. Entering the diagnose sys ha dump-by group command from the primary unit CLI displays information similar to the following:

# diagnose sys ha dump-by group 
...
vcluster_nr=1
vcluster-1: start_time=1712913904(2024-04-12 02:25:04), state/o/chg_time=2(work)/2(work)/1712913904(2024-04-12 02:25:04)
        pingsvr_flip_timeout/expire=3600s/0s
        'FG101FTK19xxxxx8': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=0/2
        'FG101FTK19xxxxx7': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=189/2

The last two lines of the output display status information about each cluster unit including the uptime. The uptime is the age difference in seconds between the two units in the cluster.

Changing the cluster age difference margin

You can change the cluster age difference margin using the following command:

config system ha
    set ha-uptime-diff-margin <margin> 
end

Where the <margin> can be from 1 to 65535 seconds (default = 300).

Resetting the uptime of a unit

For debugging purpose, you may want to reset the HA member’s uptime without restarting the unit or changing the status of a monitored interface.

To manually change the uptime:

# diagnose sys ha reset-uptime

Administration Guide

HA primary unit selection criteria

HA primary unit selection criteria

When does primary unit selection occur?

Relevant configurations

From CLI:

From GUI:

Primary unit selection criteria

Viewing the role of the unit

Viewing how the primary unit was selected

Comparing the HA uptime between cluster members

Changing the cluster age difference margin

Resetting the uptime of a unit

To manually change the uptime:

HA primary unit selection criteria

When does primary unit selection occur?

Relevant configurations

From CLI:

From GUI:

Primary unit selection criteria

Viewing the role of the unit

Viewing how the primary unit was selected

Comparing the HA uptime between cluster members

Changing the cluster age difference margin

Resetting the uptime of a unit

To manually change the uptime: