Fortinet white logo
Fortinet white logo

FortiGate-7000 Handbook

How link and module failures affect primary FortiGate-7000 selection

How link and module failures affect primary FortiGate-7000 selection

The total number of connected data interfaces in a FortiGate-7000 has a higher priority than the number of failed modules in determining which FortiGate-7000 HA configuration has become the primary FortiGate-7000. For example, if one chassis has a failed FPM and the other has a disconnected or failed data interface, the chassis with the failed FPM becomes the primary unit.

For another example, the following diagnose sys ha status command shows the HA status for a cluster where one FortiGate-7000 has a disconnected or failed data interface and the other FortiGate-7000 has a failed FPM.

diagnose sys ha status
==========================================================================
Slot: 2 Module SN: FIM01E3E16000088
Chassis HA mode: a-p

Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E33E16000027: Master, serialno_prio=0, usr_priority=128, hostname=Chassis-K
FG74E13E16000072: Slave, serialno_prio=1, usr_priority=128, hostname=Chassis-J

HA member information:
Chassis-K(FIM01E3E16000088), Slave(priority=1), uptime=2237.46, slot=2, chassis=1(1)
    slot: 2, chassis_uptime=2399.58,
    state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 best=yes
            local_interface= 2-M2 best=no

Chassis-J(FIM01E3E16000031), Slave(priority=2), uptime=2151.75, slot=2, chassis=2(1)
    slot: 2, chassis_uptime=2151.75,
    state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-J(FIM01E3E16000033), Slave(priority=3), uptime=2229.63, slot=1, chassis=2(1)
    slot: 1, chassis_uptime=2406.78,
    state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-K(FIM01E3E16000086), Master(priority=0), uptime=2203.30, slot=1, chassis=1(1)
    slot: 1, chassis_uptime=2203.30,
    state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=1, force-state(0:none)
           traffic-bandwidth-score=30, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.74 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

This output shows that chassis 1 (hostname Chassis-K) is the primary or master FortiGate-7000. The reason for this is that chassis 1 has a total traffic-bandwidth-score of 30 + 20 = 50, while the total traffic-bandwidth-score for chassis 2 (hostname Chassis-J) is 20 + 20 = 40.

The output also shows that both FIMs in chassis 1 have detected a worker failure (worker_failure=1/2) while both FIMs in chassis 2 have not detected a worker failure worker_failure=0/2). The intf-state=(port up)=1 field shows that FIM in slot 1 of chassis 1 has one more interface connected than the FIM in slot 1 of chassis 2. It is this extra connected interface that gives the FIM in chassis 1 slot 1 the higher traffic bandwidth score than the FIM in slot 1 of chassis 2.

One of the interfaces on the FIM in slot 1 of chassis 2 must have failed. In a normal HA configuration the FIMs in matching slots of each chassis should have redundant interface connections. So if one module has fewer connected interfaces this indicates a link failure.

FIM failures

If an FIM fails, not only will HA recognize this as a module failure it will also give the chassis with the failed FIM a much lower traffic bandwidth score. So an FIM failure would be more likely to cause an HA failover than a FPM failover.

Also, the traffic bandwidth score for an FIM with more connected interfaces would be higher than the score for an FIM with fewer connected interfaces. So if a different FIM failed in each chassis, the chassis with the functioning FIM with the most connected data interfaces would have the highest traffic bandwidth score and would become the primary chassis.

Management link failures

Management connections to a FortiGate-7000 can affect primary chassis selection. If the management connection to one FortiGate-7000 become disconnected a failover will occur and the FortiGate-7000 that still has management connections will become the primary FortiGate-7000.

If there are no failures and if you haven't configured any settings to influence primary chassis selection, the chassis with the highest serial number to becomes the primary chassis.

Using the serial number is a convenient way to differentiate FortiGate-7000 chassis; so basing primary chassis selection on the serial number is predictable and easy to understand and interpret. Also the chassis with the highest serial number would usually be the newest chassis with the most recent hardware version. In many cases you may not need active control over primary chassis selection, so basic primary chassis selection based on serial number is sufficient.

In some situations you may want have control over which chassis becomes the primary chassis. You can control primary chassis selection by setting the priority of one chassis to be higher than the priority of the other. If you change the priority of one of the chassis, during negotiation, the chassis with the highest priority becomes the primary chassis. As shown above, FortiGate-7000 FGCP selects the primary chassis based on priority before serial number. For more information about how to use priorities, see How link and module failures affect primary FortiGate-7000 selection.

Chassis uptime is also a factor. Normally when two chassis start up their uptimes are similar and do not affect primary chassis selection. However, during operation, if one of the chassis goes down the other will have a much higher uptime and will be selected as the primary chassis before priority and serial number are tested.

Verifying primary chassis selection

You can use the diagnose sys ha status command to verify which chassis has become the primary chassis as shown by the following command output example. This output also shows that the chassis with the highest serial number was selected to be the primary chassis.

diagnose  sys  ha  status
==========================================================================
Current slot: 1  Module SN: FIM04E3E16000085
Chassis HA mode: a-p

Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E83E16000015:  Slave, serialno_prio=1, usr_priority=128, hostname=CH15
FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16

How link and module failures affect primary FortiGate-7000 selection

How link and module failures affect primary FortiGate-7000 selection

The total number of connected data interfaces in a FortiGate-7000 has a higher priority than the number of failed modules in determining which FortiGate-7000 HA configuration has become the primary FortiGate-7000. For example, if one chassis has a failed FPM and the other has a disconnected or failed data interface, the chassis with the failed FPM becomes the primary unit.

For another example, the following diagnose sys ha status command shows the HA status for a cluster where one FortiGate-7000 has a disconnected or failed data interface and the other FortiGate-7000 has a failed FPM.

diagnose sys ha status
==========================================================================
Slot: 2 Module SN: FIM01E3E16000088
Chassis HA mode: a-p

Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E33E16000027: Master, serialno_prio=0, usr_priority=128, hostname=Chassis-K
FG74E13E16000072: Slave, serialno_prio=1, usr_priority=128, hostname=Chassis-J

HA member information:
Chassis-K(FIM01E3E16000088), Slave(priority=1), uptime=2237.46, slot=2, chassis=1(1)
    slot: 2, chassis_uptime=2399.58,
    state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 best=yes
            local_interface= 2-M2 best=no

Chassis-J(FIM01E3E16000031), Slave(priority=2), uptime=2151.75, slot=2, chassis=2(1)
    slot: 2, chassis_uptime=2151.75,
    state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-J(FIM01E3E16000033), Slave(priority=3), uptime=2229.63, slot=1, chassis=2(1)
    slot: 1, chassis_uptime=2406.78,
    state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=0, force-state(0:none)
           traffic-bandwidth-score=20, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

Chassis-K(FIM01E3E16000086), Master(priority=0), uptime=2203.30, slot=1, chassis=1(1)
    slot: 1, chassis_uptime=2203.30,
    state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0,
           intf_state=(port up)=1, force-state(0:none)
           traffic-bandwidth-score=30, mgmt-link=1
    hbdevs: local_interface= 2-M1 last_hb_time= 2399.74 status=alive
            local_interface= 2-M2 last_hb_time= 0.00 status=dead

This output shows that chassis 1 (hostname Chassis-K) is the primary or master FortiGate-7000. The reason for this is that chassis 1 has a total traffic-bandwidth-score of 30 + 20 = 50, while the total traffic-bandwidth-score for chassis 2 (hostname Chassis-J) is 20 + 20 = 40.

The output also shows that both FIMs in chassis 1 have detected a worker failure (worker_failure=1/2) while both FIMs in chassis 2 have not detected a worker failure worker_failure=0/2). The intf-state=(port up)=1 field shows that FIM in slot 1 of chassis 1 has one more interface connected than the FIM in slot 1 of chassis 2. It is this extra connected interface that gives the FIM in chassis 1 slot 1 the higher traffic bandwidth score than the FIM in slot 1 of chassis 2.

One of the interfaces on the FIM in slot 1 of chassis 2 must have failed. In a normal HA configuration the FIMs in matching slots of each chassis should have redundant interface connections. So if one module has fewer connected interfaces this indicates a link failure.

FIM failures

If an FIM fails, not only will HA recognize this as a module failure it will also give the chassis with the failed FIM a much lower traffic bandwidth score. So an FIM failure would be more likely to cause an HA failover than a FPM failover.

Also, the traffic bandwidth score for an FIM with more connected interfaces would be higher than the score for an FIM with fewer connected interfaces. So if a different FIM failed in each chassis, the chassis with the functioning FIM with the most connected data interfaces would have the highest traffic bandwidth score and would become the primary chassis.

Management link failures

Management connections to a FortiGate-7000 can affect primary chassis selection. If the management connection to one FortiGate-7000 become disconnected a failover will occur and the FortiGate-7000 that still has management connections will become the primary FortiGate-7000.

If there are no failures and if you haven't configured any settings to influence primary chassis selection, the chassis with the highest serial number to becomes the primary chassis.

Using the serial number is a convenient way to differentiate FortiGate-7000 chassis; so basing primary chassis selection on the serial number is predictable and easy to understand and interpret. Also the chassis with the highest serial number would usually be the newest chassis with the most recent hardware version. In many cases you may not need active control over primary chassis selection, so basic primary chassis selection based on serial number is sufficient.

In some situations you may want have control over which chassis becomes the primary chassis. You can control primary chassis selection by setting the priority of one chassis to be higher than the priority of the other. If you change the priority of one of the chassis, during negotiation, the chassis with the highest priority becomes the primary chassis. As shown above, FortiGate-7000 FGCP selects the primary chassis based on priority before serial number. For more information about how to use priorities, see How link and module failures affect primary FortiGate-7000 selection.

Chassis uptime is also a factor. Normally when two chassis start up their uptimes are similar and do not affect primary chassis selection. However, during operation, if one of the chassis goes down the other will have a much higher uptime and will be selected as the primary chassis before priority and serial number are tested.

Verifying primary chassis selection

You can use the diagnose sys ha status command to verify which chassis has become the primary chassis as shown by the following command output example. This output also shows that the chassis with the highest serial number was selected to be the primary chassis.

diagnose  sys  ha  status
==========================================================================
Current slot: 1  Module SN: FIM04E3E16000085
Chassis HA mode: a-p

Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E83E16000015:  Slave, serialno_prio=1, usr_priority=128, hostname=CH15
FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16