Fortinet black logo

Handbook

Troubleshooting

6.0.0
Copy Link
Copy Doc ID 4afb0436-a998-11e9-81a4-00505692583a:846295
Download PDF

Troubleshooting

This section describes some HA clustering troubleshooting techniques.

Ignoring hardware revisions

Many FortiGate platforms have gone through multiple hardware versions and in some cases the hardware changes prevent cluster formation. If you run into this problem you can use the following command on each FortiGate to cause the cluster to ignore different hardware versions:

execute ha ignore-hardware-revision enable

This command is only available on FortiGates that have had multiple hardware revisions.

By default the command is set to prevent cluster formation between FortiGates with different hardware revisions. You can enter the following command to view its status:

execute ha ignore-hardware-revision status

Usually the incompatibility is caused by different hardware versions having different hard disks and enabling this command disables the hard disks in each FortiGate. As a result of disabling hard disks the cluster will not support logging to the hard disk or WAN Optimization.

If the FortiGates do have compatible hardware versions or if you want to run a FortiGate in standalone mode you can enter the following command to disable ignoring the hardware revision and enable the hard disks:

execute ha ignore-hardware-revision disable

Affected models include but are not limited to:

  • FortiGate-100D
  • FortiGate-300C
  • FortiGate-600C
  • FortiGate-800C
  • FortiGate-80C and FortiWiFi-80C
  • FortiGate-60C
note icon Its possible that a cluster will not form because the disk partition sizes of the cluster units are different. You can use the diagnose sys ha checksum test | grep storage command to check the disk storage checksum of each cluster unit. If the checksums are different then visit the Fortinet Support website for help in setting up compatible storage partitions.

Before you set up a cluster

Before you set up a cluster ask yourself the following questions about the FortiGates that you are planning to use to create a cluster.

  1. Do all the FortiGates have the same hardware configuration? Including the same hard disk configuration?
  2. Do all of the FortiGates have the same FortiGuard, FortiCloud, FortiClient, VDOM and FortiOS Carrier licensing?
  3. Do all the FortiGates have the same firmware build?
  4. Are all the FortiGates set to the same operating mode (NAT or transparent)?
  5. Are all the FortiGates operating in single VDOM mode?
  6. If the FortiGates are operating in multiple VDOM mode do they all have the same VDOM configuration?

note icon In some cases you may be able to form a cluster if different FortiGates have different firmware builds, different VDOM configurations, and are in different operating modes. However, if you encounter problems they may be resolved by installing the same firmware build on each unit, and give them the same VDOM configuration and operating mode. If the FortiGates in the cluster have different licenses, the cluster will form but it will operate with the lowest licensing level.

Troubleshooting the initial cluster configuration

This section describes how to check a cluster when it first starts up to make sure that it is configured and operating correctly. This section assumes you have already configured your HA cluster.

To verify that a cluster can process traffic and react to a failure
  1. Add a basic security policy configuration and send network traffic through the cluster to confirm connectivity.

    For example, if the cluster is installed between the internet and an internal network, set up a basic internal to external security policy that accepts all traffic. Then from a PC on the internal network, browse to a website on the internet or ping a server on the internet to confirm connectivity.

  2. From your management PC, set ping to continuously ping the cluster, and then start a large download, or in some other way establish ongoing traffic through the cluster.
  3. While traffic is going through the cluster, disconnect the power from one of the cluster units.

    You could also shut down or restart a cluster unit.

    Traffic should continue with minimal interruption.

  4. Start up the cluster unit that you disconnected.

    The unit should re-join the cluster with little or no affect on traffic.

  5. Disconnect a cable from one of the HA heartbeat interfaces.

    The cluster should keep functioning, using the other HA heartbeat interface.

  6. If you have port monitoring enabled, disconnect a network cable from a monitored interface.

    Traffic should continue with minimal interruption.

To verify the cluster configuration from the GUI

Use these steps if a cluster is formed just to verify its status and configuration.

  1. Log into the cluster GUI.
  2. Check the system dashboard to verify that the System Information widget displays all of the cluster units.
  3. Check the Unit Operation widget graphic to verify that the correct cluster unit interfaces are connected.
  4. Go to System > HA or from the System Information dashboard widget select HA Status > Configure and verify that all of the cluster units are displayed on the HA Cluster list.
  5. From the cluster members list, edit the primary unit and verify the cluster configuration is as expected.
To troubleshoot the cluster configuration from the GUI

Use these steps if the FortiGates don't successfully form a cluster:

  1. Connect to each cluster unit GUI and verify that the HA configurations are the same. The HA configurations of all of the cluster units must be identical. Even though the HA configuration is very simple you can easily make a small mistake that prevents a FortiGate from joining a cluster.
  2. If the configurations are the same, try re-entering the HA Password on each cluster unit in case you made an error typing the password when configuring one of the cluster units.
  3. Check that the correct interfaces of each cluster unit are connected.

    Check the cables and interface LEDs.

    Use the Unit Operation dashboard widget, system network interface list, or cluster members list to verify that each interface that should be connected actually is connected.

    If the link is down re-verify the physical connection. Try replacing network cables or switches as required.

To verify the cluster configuration from the CLI

Use these steps if a cluster is formed just to verify its status and configuration.

  1. Log into each cluster unit CLI.

    You can use the console connection if you need to avoid the problem of units having the same IP address.

  2. Enter the command get system status.

    Look for the following information in the command output.

    Current HA mode: a-a, master The cluster units are operating as a cluster and you have connected to the primary unit.
    Current HA mode: a-a, backup The cluster units are operating as a cluster and you have connected to a subordinate unit.
    Current HA mode: standalone The cluster unit is not operating in HA mode
  3. Verify that the get system ha status command shows that the cluster health is OK and shows that all of the cluster units have joined the cluster.
  4. Enter the get system ha command to verify that the HA configuration is correct and the same for each cluster unit.
To troubleshoot the cluster configuration from the CLI

Try these steps if the FortiGates don't successfully form a cluster:

  1. Try using the following command to re-enter the cluster password on each cluster unit in case you made an error typing the password when configuring one of the cluster units.

    config system ha

    set password <password>

    end

  2. Check that the correct interfaces of each cluster unit are connected.

    Check the cables and interface LEDs.

    Use get hardware nic <interface_name> command to confirm that each interface is connected. If the interface is connected the command output should contain a Link: up entry similar to the following:

    get hardware nic port1

    .

    .

    .

    Link: up

    .

    .

    .

    If the link is down, re-verify the physical connection. Try replacing network cables or switches as required.

More troubleshooting information

Much of the information in this HA guide can be useful for troubleshooting HA clusters. Here are some links to sections with more information.

  • If sessions are lost after a failover you may need to change route-ttl to keep synchronized routes active longer. See Synchronizing kernel routing tables
  • To control which cluster unit becomes the primary unit, you can change the device priority and enable override. See Controlling primary unit selection using device priority and overrideControlling primary unit selection using device priority and override
  • Changes made to a cluster can be lost if override is enabled. See Configuration changes can be lost if override is enabled
  • When override is enabled, after a failover traffic may be disrupted if the primary unit rejoins the cluster before the session tables are synchronized or for other reasons such as if the primary unit is configured for DHCP or PPPoE. See Delaying how quickly the primary unit rejoins the cluster when override is enabled.
  • In some cases, age differences among cluster units result in the wrong cluster unit becoming the primary unit. For example, if a cluster unit set to a high priority reboots, that unit will have a lower age than other cluster units. You can resolve this problem by resetting the age of one or more cluster units. See Primary unit selection with override disabled (default) You can also adjust how sensitive the cluster is to age differences. This can be useful if large age differences cause problems. See Cluster age difference margin (grace period) and Changing the cluster age difference margin.
  • If one of the cluster units needs to be serviced or removed from the cluster for other reasons, you can do so without affecting the operation of the cluster. See Disconnecting a FortiGate
  • The GUI and CLI will not allow you to configure HA if you have enabled FGSP HA. See FGSP.
  • The GUI and CLI will not allow you to configure HA if one or more FortiGate interfaces is configured as a PPTP or L2TP client.
  • The FGCP is compatible with DHCP and PPPoE but care should be taken when configuring a cluster that includes a FortiGate interface configured to get its IP address with DHCP or PPPoE. Fortinet recommends that you turn on DHCP or PPPoE addressing for an interface after the cluster has been configured. See DHCP and PPPoE compatability.
  • Some third-party network equipment may prevent HA heartbeat communication, resulting in a failure of the cluster or the creation of a split brain scenario. For example, some switches use packets with the same Ethertype as HA heartbeat packets use for internal functions and when used for HA heartbeat communication the switch generates CRC errors and the packets are not forwarded. See Heartbeat packet Ethertypes.
  • Very busy clusters may not be able to send HA heartbeat packets quickly enough, also resulting in a split brain scenario. You may be able to resolve this problem by modifying HA heartbeat timing. See Modifying heartbeat timing.
  • Very busy clusters may suffer performance reductions if session pickup is enabled. If possible you can disable this feature to improve performance. If you require session pickup for your cluster, several options are available for improving session pickup performance. See Improving session synchronization performance on page 1.
  • If it takes longer than expected for a cluster to failover you can try changing how the primary unit sends gratuitous ARP packets. See Changing how the primary unit sends gratuitous ARP packets after a failover.
  • When you first put a FortiGate in HA mode you may loose connectivity to the unit. This occurs because HA changes the MAC addresses of all FortiGate interfaces, including the one that you are connecting to. The cluster MAC addresses also change if you change some HA settings such as the cluster group ID. The connection will be restored in a short time as your network and PC updates to the new MAC address. To reconnect sooner, you can update the ARP table of your management PC by deleting the ARP table entry for the FortiGate (or just deleting all arp table entries). You may be able to delete the arp table of your management PC from a command prompt using a command similar to arp -d.
  • Since HA changes all cluster unit MAC addresses, if your network uses MAC address filtering you may have to make configuration changes to account for the HA MAC addresses.
  • A network may experience packet loss when two FortiGate HA clusters have been deployed in the same broadcast domain. Deploying two HA clusters in the same broadcast domain can result in packet loss because of MAC address conflicts. The packet loss can be diagnosed by pinging from one cluster to the other or by pinging both of the clusters from a device within the broadcast domain. You can resolve the MAC address conflict by changing the HA Group ID configuration of the two clusters. The HA Group ID is sometimes also called the Cluster ID. See Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain
  • The cluster CLI displays slave is not in sync messages if there is a synchronization problem between the primary unit and one or more subordinate units. See How to diagnose HA out of sync messages.
  • If you have configured dynamic routing and the new primary unit takes too long to update its routing table after a failover you can configure graceful restart and also optimize how routing updates are synchronized. See Routing graceful restart and Synchronizing kernel routing tables.
  • Some switches may not be able to detect that the primary unit has become a subordinate unit and will keep sending packets to the former primary unit. This can occur after a link failover if the switch does not detect the failure and does not clear its MAC forwarding table. See Updating MAC forwarding tables when a link failover occurs.
  • If a link not directly connected to a cluster unit (for example, between a switch connected to a cluster interface and the network) fails you can enable remote link failover to maintain communication. See Remote link failover.
  • If you find that some cluster units are not running the same firmware build you can reinstall the correct firmware build on the cluster to upgrade all cluster units to the same firmware build. See Synchronizing the firmware build running on a new cluster unit.

Troubleshooting

This section describes some HA clustering troubleshooting techniques.

Ignoring hardware revisions

Many FortiGate platforms have gone through multiple hardware versions and in some cases the hardware changes prevent cluster formation. If you run into this problem you can use the following command on each FortiGate to cause the cluster to ignore different hardware versions:

execute ha ignore-hardware-revision enable

This command is only available on FortiGates that have had multiple hardware revisions.

By default the command is set to prevent cluster formation between FortiGates with different hardware revisions. You can enter the following command to view its status:

execute ha ignore-hardware-revision status

Usually the incompatibility is caused by different hardware versions having different hard disks and enabling this command disables the hard disks in each FortiGate. As a result of disabling hard disks the cluster will not support logging to the hard disk or WAN Optimization.

If the FortiGates do have compatible hardware versions or if you want to run a FortiGate in standalone mode you can enter the following command to disable ignoring the hardware revision and enable the hard disks:

execute ha ignore-hardware-revision disable

Affected models include but are not limited to:

  • FortiGate-100D
  • FortiGate-300C
  • FortiGate-600C
  • FortiGate-800C
  • FortiGate-80C and FortiWiFi-80C
  • FortiGate-60C
note icon Its possible that a cluster will not form because the disk partition sizes of the cluster units are different. You can use the diagnose sys ha checksum test | grep storage command to check the disk storage checksum of each cluster unit. If the checksums are different then visit the Fortinet Support website for help in setting up compatible storage partitions.

Before you set up a cluster

Before you set up a cluster ask yourself the following questions about the FortiGates that you are planning to use to create a cluster.

  1. Do all the FortiGates have the same hardware configuration? Including the same hard disk configuration?
  2. Do all of the FortiGates have the same FortiGuard, FortiCloud, FortiClient, VDOM and FortiOS Carrier licensing?
  3. Do all the FortiGates have the same firmware build?
  4. Are all the FortiGates set to the same operating mode (NAT or transparent)?
  5. Are all the FortiGates operating in single VDOM mode?
  6. If the FortiGates are operating in multiple VDOM mode do they all have the same VDOM configuration?

note icon In some cases you may be able to form a cluster if different FortiGates have different firmware builds, different VDOM configurations, and are in different operating modes. However, if you encounter problems they may be resolved by installing the same firmware build on each unit, and give them the same VDOM configuration and operating mode. If the FortiGates in the cluster have different licenses, the cluster will form but it will operate with the lowest licensing level.

Troubleshooting the initial cluster configuration

This section describes how to check a cluster when it first starts up to make sure that it is configured and operating correctly. This section assumes you have already configured your HA cluster.

To verify that a cluster can process traffic and react to a failure
  1. Add a basic security policy configuration and send network traffic through the cluster to confirm connectivity.

    For example, if the cluster is installed between the internet and an internal network, set up a basic internal to external security policy that accepts all traffic. Then from a PC on the internal network, browse to a website on the internet or ping a server on the internet to confirm connectivity.

  2. From your management PC, set ping to continuously ping the cluster, and then start a large download, or in some other way establish ongoing traffic through the cluster.
  3. While traffic is going through the cluster, disconnect the power from one of the cluster units.

    You could also shut down or restart a cluster unit.

    Traffic should continue with minimal interruption.

  4. Start up the cluster unit that you disconnected.

    The unit should re-join the cluster with little or no affect on traffic.

  5. Disconnect a cable from one of the HA heartbeat interfaces.

    The cluster should keep functioning, using the other HA heartbeat interface.

  6. If you have port monitoring enabled, disconnect a network cable from a monitored interface.

    Traffic should continue with minimal interruption.

To verify the cluster configuration from the GUI

Use these steps if a cluster is formed just to verify its status and configuration.

  1. Log into the cluster GUI.
  2. Check the system dashboard to verify that the System Information widget displays all of the cluster units.
  3. Check the Unit Operation widget graphic to verify that the correct cluster unit interfaces are connected.
  4. Go to System > HA or from the System Information dashboard widget select HA Status > Configure and verify that all of the cluster units are displayed on the HA Cluster list.
  5. From the cluster members list, edit the primary unit and verify the cluster configuration is as expected.
To troubleshoot the cluster configuration from the GUI

Use these steps if the FortiGates don't successfully form a cluster:

  1. Connect to each cluster unit GUI and verify that the HA configurations are the same. The HA configurations of all of the cluster units must be identical. Even though the HA configuration is very simple you can easily make a small mistake that prevents a FortiGate from joining a cluster.
  2. If the configurations are the same, try re-entering the HA Password on each cluster unit in case you made an error typing the password when configuring one of the cluster units.
  3. Check that the correct interfaces of each cluster unit are connected.

    Check the cables and interface LEDs.

    Use the Unit Operation dashboard widget, system network interface list, or cluster members list to verify that each interface that should be connected actually is connected.

    If the link is down re-verify the physical connection. Try replacing network cables or switches as required.

To verify the cluster configuration from the CLI

Use these steps if a cluster is formed just to verify its status and configuration.

  1. Log into each cluster unit CLI.

    You can use the console connection if you need to avoid the problem of units having the same IP address.

  2. Enter the command get system status.

    Look for the following information in the command output.

    Current HA mode: a-a, master The cluster units are operating as a cluster and you have connected to the primary unit.
    Current HA mode: a-a, backup The cluster units are operating as a cluster and you have connected to a subordinate unit.
    Current HA mode: standalone The cluster unit is not operating in HA mode
  3. Verify that the get system ha status command shows that the cluster health is OK and shows that all of the cluster units have joined the cluster.
  4. Enter the get system ha command to verify that the HA configuration is correct and the same for each cluster unit.
To troubleshoot the cluster configuration from the CLI

Try these steps if the FortiGates don't successfully form a cluster:

  1. Try using the following command to re-enter the cluster password on each cluster unit in case you made an error typing the password when configuring one of the cluster units.

    config system ha

    set password <password>

    end

  2. Check that the correct interfaces of each cluster unit are connected.

    Check the cables and interface LEDs.

    Use get hardware nic <interface_name> command to confirm that each interface is connected. If the interface is connected the command output should contain a Link: up entry similar to the following:

    get hardware nic port1

    .

    .

    .

    Link: up

    .

    .

    .

    If the link is down, re-verify the physical connection. Try replacing network cables or switches as required.

More troubleshooting information

Much of the information in this HA guide can be useful for troubleshooting HA clusters. Here are some links to sections with more information.

  • If sessions are lost after a failover you may need to change route-ttl to keep synchronized routes active longer. See Synchronizing kernel routing tables
  • To control which cluster unit becomes the primary unit, you can change the device priority and enable override. See Controlling primary unit selection using device priority and overrideControlling primary unit selection using device priority and override
  • Changes made to a cluster can be lost if override is enabled. See Configuration changes can be lost if override is enabled
  • When override is enabled, after a failover traffic may be disrupted if the primary unit rejoins the cluster before the session tables are synchronized or for other reasons such as if the primary unit is configured for DHCP or PPPoE. See Delaying how quickly the primary unit rejoins the cluster when override is enabled.
  • In some cases, age differences among cluster units result in the wrong cluster unit becoming the primary unit. For example, if a cluster unit set to a high priority reboots, that unit will have a lower age than other cluster units. You can resolve this problem by resetting the age of one or more cluster units. See Primary unit selection with override disabled (default) You can also adjust how sensitive the cluster is to age differences. This can be useful if large age differences cause problems. See Cluster age difference margin (grace period) and Changing the cluster age difference margin.
  • If one of the cluster units needs to be serviced or removed from the cluster for other reasons, you can do so without affecting the operation of the cluster. See Disconnecting a FortiGate
  • The GUI and CLI will not allow you to configure HA if you have enabled FGSP HA. See FGSP.
  • The GUI and CLI will not allow you to configure HA if one or more FortiGate interfaces is configured as a PPTP or L2TP client.
  • The FGCP is compatible with DHCP and PPPoE but care should be taken when configuring a cluster that includes a FortiGate interface configured to get its IP address with DHCP or PPPoE. Fortinet recommends that you turn on DHCP or PPPoE addressing for an interface after the cluster has been configured. See DHCP and PPPoE compatability.
  • Some third-party network equipment may prevent HA heartbeat communication, resulting in a failure of the cluster or the creation of a split brain scenario. For example, some switches use packets with the same Ethertype as HA heartbeat packets use for internal functions and when used for HA heartbeat communication the switch generates CRC errors and the packets are not forwarded. See Heartbeat packet Ethertypes.
  • Very busy clusters may not be able to send HA heartbeat packets quickly enough, also resulting in a split brain scenario. You may be able to resolve this problem by modifying HA heartbeat timing. See Modifying heartbeat timing.
  • Very busy clusters may suffer performance reductions if session pickup is enabled. If possible you can disable this feature to improve performance. If you require session pickup for your cluster, several options are available for improving session pickup performance. See Improving session synchronization performance on page 1.
  • If it takes longer than expected for a cluster to failover you can try changing how the primary unit sends gratuitous ARP packets. See Changing how the primary unit sends gratuitous ARP packets after a failover.
  • When you first put a FortiGate in HA mode you may loose connectivity to the unit. This occurs because HA changes the MAC addresses of all FortiGate interfaces, including the one that you are connecting to. The cluster MAC addresses also change if you change some HA settings such as the cluster group ID. The connection will be restored in a short time as your network and PC updates to the new MAC address. To reconnect sooner, you can update the ARP table of your management PC by deleting the ARP table entry for the FortiGate (or just deleting all arp table entries). You may be able to delete the arp table of your management PC from a command prompt using a command similar to arp -d.
  • Since HA changes all cluster unit MAC addresses, if your network uses MAC address filtering you may have to make configuration changes to account for the HA MAC addresses.
  • A network may experience packet loss when two FortiGate HA clusters have been deployed in the same broadcast domain. Deploying two HA clusters in the same broadcast domain can result in packet loss because of MAC address conflicts. The packet loss can be diagnosed by pinging from one cluster to the other or by pinging both of the clusters from a device within the broadcast domain. You can resolve the MAC address conflict by changing the HA Group ID configuration of the two clusters. The HA Group ID is sometimes also called the Cluster ID. See Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain
  • The cluster CLI displays slave is not in sync messages if there is a synchronization problem between the primary unit and one or more subordinate units. See How to diagnose HA out of sync messages.
  • If you have configured dynamic routing and the new primary unit takes too long to update its routing table after a failover you can configure graceful restart and also optimize how routing updates are synchronized. See Routing graceful restart and Synchronizing kernel routing tables.
  • Some switches may not be able to detect that the primary unit has become a subordinate unit and will keep sending packets to the former primary unit. This can occur after a link failover if the switch does not detect the failure and does not clear its MAC forwarding table. See Updating MAC forwarding tables when a link failover occurs.
  • If a link not directly connected to a cluster unit (for example, between a switch connected to a cluster interface and the network) fails you can enable remote link failover to maintain communication. See Remote link failover.
  • If you find that some cluster units are not running the same firmware build you can reinstall the correct firmware build on the cluster to upgrade all cluster units to the same firmware build. See Synchronizing the firmware build running on a new cluster unit.