Fortinet white logo
Fortinet white logo

Administration Guide

Using high availability (HA)

Using high availability (HA)

FortiMail units can be configured to act in a high availability (HA) cluster or group of clusters to increase processing capacity and/or availability, so that your overall deployment uptime is preserved even if some individual hardware or software fails. Deployments may require changes to the network topology or DNS records to achieve this, depending on the HA mode.

This section contains the following topics:

About HA types

Supported FortiMail HA deployment types are:

  • Member HA: Multiple FortiMail units work together in one HA pair or cluster.
  • Group HA: Multiple HA clusters work together in a group.

For example, if you have one data center to protect, you only need one cluster. However if you have multiple data centers for redundancy or capacity, then you can join the clusters together to form an HA group.

Each cluster in an HA group has its own HA mode. At the HA group level, there is also an HA mode that defines throughput or failover amongst the clusters. Depending on your throughput or failover requirements, you can mix the HA modes at the member HA and group HA level.

See also

About HA modes

Deploying member HA

Deploying group HA

About HA modes

FortiMail HA clusters and groups of clusters can operate with HA mode as either:

  • active-passive
  • active-active

This determines network topology, fault tolerance, and total throughput.

Active-passive HA

Active-active HA

2 FortiMail units or clusters

2-24 FortiMail units or clusters

Deployed behind a switch

Deployed either behind a load balancer or with multiple DNS MX records to distribute connections among units or clusters (usually in larger organizations with email server farms)

Both configuration* and data synchronized^

Only configuration* synchronized

Only primary unit or HA group processes email

All units process email

No data loss^ when hardware fails

Data loss when hardware fails

No increased processing capacity

Increased processing capacity

* For exceptions, see Settings that are not synchronized by HA.

^ For exceptions, see Synchronization of MTA queue directories after a failover.

Active-passive member HA operating in gateway mode

only primary unit processes email in active-passive HA

Active-active member HA operating in gateway mode

all available units process email in active-active HA

Group HA with a mix of active-active and active-passive clusters

When a FortiMail unit or cluster fails, its email traffic is interrupted. SMTP clients usually handle this gracefully, and restart a new connection. Traffic is redirected away from the point of failure by different methods that vary by HA mode:

  • Active-passive: A secondary becomes primary and starts using the using the Virtual IP address (or Virtual IPv6 address) and/or Virtual hostname. Then it uses ARP to notify the nearby OSI Layer 2 switch or router about the link change, and they automatically redirect traffic.

  • Active-active: The load balancer health check detects a failure. It stops distributing traffic to failed FortiMail units or clusters. Only live FortiMail units continue to receive traffic.

Traffic for other IP addresses, such as administrative connections on the management network interfaces, may continue to reach the failed unit or cluster, depending on your network topology. This can be used to troubleshoot the cause of failure, or to reconfigure HA.

See also

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Storing mail data from HA clusters on a NAS server

Mixing HA modes

Mix the cluster's HA mode and the group's HA mode, if needed for your use case. For example, to increase service uptime and reduce data loss risks, you could join active-passive HA clusters together into an active-passive group. This reduces risk of data loss a little more than a standalone active-passive cluster. However it also reduces throughput to a fraction of what is possible, because only the primary unit of the primary cluster (1/4 of your total FortiMail units) will be actively processing email. Instead you may prefer to mix HA modes for a balance of availability and performance: active-passive clusters, in an active-active group.

Synchronization varies by HA mode. Therefore a mix of modes can change which settings are synchronized.

About HA heartbeat and synchronization

Heartbeat network interfaces:

  • monitor for failure of the primary unit in the HA cluster or group of clusters (a health check)

  • synchronize configuration changes from the primary unit to secondaries (and in groups of clusters, from the primary cluster to the secondaries)

    For exceptions, see Settings that are not synchronized by HA.

  • (active-passive only, and only if enabled) synchronize the mail queue, FortiMail system mail directory, and user home directories

    For exceptions, see Storing mail data from HA clusters on a NAS server.

Note

Synchronization intervals vary.

  • FortiGuard Antispam and FortiGuard Antivirus packages: Not synchronized.
  • Mail queue: Up to 20 minutes (not real time).
  • Configuration: Real time.

If configuration synchronization did not occur when expected, or if you have inadvertently de-synchronized the secondary unit’s configuration (for example, if a cable was accidentally disconnected), then you can manually initiate synchronization on either the primary unit or the secondary unit.

Periodically, secondary units verify configuration synchronization. If it's not in sync, then secondary units pull changes from the primary, and reload the configuration. In active-active HA, block list and safe list changes are also pushed from the secondary to the primary unit, and then synchronized to all other secondary units.

Note

Due to the introduction of primary backup in active-active HA in FortiMail 7.4.0, communication between the secondary units is also required. In config-only HA before the 7.4.0 release, it was not required.

Heartbeats from the primary to secondaries must not be interrupted. Exceptions include when:

  • the primary reboots, or you enter the execute reload command in the CLI. If the primary unit reboots or reloads its configuration, then it signals to the secondary unit to wait for additional time.
  • Remote services as heartbeat is enabled.
Note

Remote service monitoring does not provide synchronization, and therefore is not a complete, long-term replacement for the heartbeat. Heartbeat links that are disrupted should be fixed as soon as possible.

If the heartbeat signal is lost, or if the primary detects failure via another method (hard drive monitoring, interface monitoring, or remote service monitoring), then behavior varies by HA mode:

  • Active-passive: A secondary unit or cluster becomes the new primary, and starts processing email ("failover").

  • Active-active: If Primary backup has been selected, then your preferred secondary unit or cluster will take over the role of the primary (Effective becomes Primary) for the purpose of configuration synchronization. All units continue to process email, except the failed unit.

    If a Primary backup is not selected, or if the new primary also fails, then each secondary continues as a secondary. However, with no designated primary unit, changes to the configuration are not synchronized anymore. Repair the primary or reconfigure the HA cluster to form a heartbeat with a new primary.

Some failure causes are temporary. Depending on Action on failure, even if the primary can recover, it might not automatically return to the HA cluster or group of clusters and reclaim its role. You can manually trigger this with Restore.

See also

Service Monitor section

About HA types

About HA modes

About HA port numbers and protocols

About logging, alert email, and SNMP for HA

Settings that are not synchronized by HA

Storing mail data from HA clusters on a NAS server

Synchronization of MTA queue directories after a failover

About HA port numbers and protocols

The default protocol and port numbers for HA heartbeat, synchronization, and service monitoring communications are configurable. See HA base port, the control packet setting in the FortiMail CLI Reference, and Appendix C: Port Numbers.

Note

If a firewall is between the primary and secondary FortiMail unit, then verify that the firewall policy allows HA port numbers. Blocked HA ports can cause incorrect failover and synchronization failure.

Settings that are not synchronized by HA

All settings on the primary unit are synchronized to the secondary unit, except the following:

Settings

Explanation

Licenses

FortiGuard Antivirus, FortiGuard Antispam, and other service subscription and feature licenses are specific to each FortiMail unit, and are not synchronized, regardless of HA mode.

Operation mode

You must set the operation mode (gateway, transparent, or server) of each FortiMail unit before they join HA. Many settings vary by operation mode, and therefore configurations cannot be synchronized if the operation mode is different.

Host name

Different host names are used to distinguish members of the HA cluster when connecting to the GUI and to indicate which unit failed. For details, see Host name.

Static route

Static routes are not synchronized because some or all in the network interfaces on each FortiMail unit in the HA cluster may be connected to different subnets. See also Configuring static routes .

Interface configuration

(gateway and server mode only)

Administrator connections to the GUI/CLI, alert email, and many other features require that you configure at least one network interface with an IP address. For details, see Configuring the network interfaces.

Exceptions include virtual IP addresses on active-passive HA. Virtual IP addresses are synchronized because, upon failover, the secondary unit must starts to use them. This mechanism allows traffic to receive connections instead of the failed primary unit. See Virtual IP address (or Virtual IPv6 address).

Management IP address

(transparent mode only)

Each FortiMail unit in the HA cluster should be configured with different management IP addresses for GUI and CLI connectivity purposes. For details, see About the management IP.

SNMP system information

Each FortiMail unit in the HA cluster will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring SNMP queries and traps.

RAID configuration

RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

Some HA settings

Product name and icon

The product name and icon under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

Miscellaneous settings
(active-active HA only)

In active-active HA, the following settings are not synchronized:

All system, domain, and user level block/safe lists are synchronized.

Note

User data is synchronized at predefined time intervals, not in real time.

See also

About HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

During normal operation in active-passive HA, email is either:

  • being received or sent by the primary FortiMail unit or cluster
  • waiting to be delivered in the mail queue
  • stored in the primary’s mail data directories (quarantines, email archives, and, for server mode, email inboxes)

When a failure occurs, sending and receiving is interrupted. The delivery attempt fails. Usually, the sender retries. However, stored email remains in the mail data directories.

To prevent data loss when a primary fails, you usually should enable Synchronize mail data directory (unless NAS storage is used), but do not need to enable Synchronize MTA queue directory. This is because of an automatic recovery mechanism in FortiMail HA failover.

  1. The secondary unit detects that the primary unit has failed, and becomes the new primary.

  2. If the failed unit can reboot, it detects the new primary unit.

  3. The former primary unit pushes its mail queue to the new primary unit.

    This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

  4. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

As a result, if the failed primary unit can restart, no email is lost from the mail queue.

See also

About HA heartbeat and synchronization

Storing mail data from HA clusters on a NAS server

Storing mail data from HA clusters on a NAS server

In active-active HA, if FortiMail units are operating in server mode, you must store mail data centrally on a network attached storage (NAS) server — not on each FortiMail unit. Otherwise users’ email and other data could be scattered across multiple FortiMail units, and it won't be available when they connect to another.

For other HA and operating modes, it also may be better to store mail data on a NAS server.

For example, regular NAS server backups help to prevent mail data loss, even if a FortiMail unit has hardware failure. Also, during a temporary failure of a FortiMail unit, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

For active-active HA with a NAS server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email.

For active-passive HA, the primary unit stores all mail data on the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit also uses the same NAS server, and continues operating with no loss of mail data.

Note

If FortiMail units are in active-passive HA, and store mail data on a remote NAS server, disable Synchronize mail data directory to reduce redundant network traffic and save bandwidth.

For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

See also

About HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

About logging, alert email, and SNMP for HA

For faster discovery and diagnosis of network problems that have caused an HA failover, you can configure SNMP, Syslog, and/or alert email to monitor FortiMail HA.

To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to secondary units, all FortiMail units in the HA cluster or group of clusters record their own separate log messages and send separate alert email messages. Log data is not synchronized.

Note

To distinguish alert email from each FortiMail unit in HA, configure a different host name for each. For details, see Host name.

To use SNMP to monitor HA failover, configure each cluster member to enable HA events for the SNMP community, such as:

See also

Configuring SNMP queries and traps

Logs, reports, and alerts

About HA heartbeat and synchronization

Configuring HA

Depending on your HA deployment scenario, use the following procedures to deploy either the member or group type of HA.

After you configure HA, usually administrators connect only to the primary unit. Changes made to the primary unit are synchronized to the secondary units. See About HA heartbeat and synchronization.

Exceptions include:

Deploying member HA

The following procedures describe how to set up a FortiMail pair or cluster in the member HA type.

  1. Register all FortiMail units in the HA cluster with the Fortinet Technical Support web site:

    https://support.fortinet.com/

    If you use licensed features such as centralized HA monitoring, FortiGuard Antivirus, and/or FortiGuard Antispam, you must purchase and register licenses for each unit.

    Note

    You can mix different models in FortiMail HA. However:

  2. Design a network topology that avoids a single point of failure.

    For example, if there is only one router or firewall or ISP link, and it fails, then service downtime will occur even if the FortiMail HA cluster and email servers are still operating normally. To avoid this risk, if possible, all devices and links should be redundant. (Connect each FortiMail unit to two gateway routers, etc. You may need more network cables and devices to achieve this.)

  3. Connect the network interfaces that will be used for HA heartbeat and synchronization. At least one heartbeat link is required.

    For example, you could use a network cable to directly connect FortiMail A's port2 to FortiMail B's port2.

    Caution

    To minimize failovers and sync disruptions, create two heartbeat links. Either:

    • Directly link each pair of heartbeat ports with an Ethernet crossover cable.
    • Connect each pair through an isolated, dedicated local switch.

    This guarantees bandwidth and lower latency for the synchronization and heartbeat, even if one cable is accidentally disconnected. For better reliability, also enable Remote services as heartbeat.

    Note

    Don't use DHCP IP addresses for heartbeat links. DHCP can be the default or common for VM instances in cloud deployments, but DHCP can disrupt the HA heartbeat link when an IP address has not been assigned yet by the DHCP server, such as:

    • during firmware upgrades

    • if DHCP clients have an IP address conflict

    • if DHCP reservations fail

    Use static IP addresses instead.

    Don't disconnect heartbeat links once HA is enabled. If the heartbeat is interrupted, then the secondary will assume that the primary has failed, and become the new primary. If no failure has actually occurred, however, both FortiMail units will be operating as primary units at the same time (a "split brain"). This disrupts synchronization and could cause scattered data. In active-passive HA, it also can cause an IP address conflicts. To correct the role on a unit that should be secondary, click Restore.

  4. If you will use active-passive HA with gateway or server operation mode, add a Virtual IP address (or Virtual IPv6 address) and Virtual hostname to the network interface on the primary unit that receives email connections.

    On internal DNS servers, update records to use this virtual IP address, not the physical IP address.

    On public DNS servers, records should still use the public IP address. If your router or firewall applies NAT, this IP address may be on their WAN or gateway interface, not the virtual IP address on FortiMail.

    Wait for the DNS records to propagate to non-authoritative DNS servers before you enable HA. This prevents service disruptions.

    Topology with virtual IP address for active-passive HA

    virtual IP address transfers to secondary upon failure of primary FortiMail in active-passive HA cluster

  5. If you will use active-active HA, configure storage of mail data on a NAS server. See Storing mail data from HA clusters on a NAS server.(Active-passive HA can also benefit from a NAS server, but does not require it.)

    Caution

    For active-active HA with server mode, you must store mail data externally on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units.

  6. If you will use remote service monitoring (SMTP etc.), then enable those services on the heartbeat network interfaces. See Mail access.

  7. On the FortiMail unit that will be the primary in the HA cluster, go to System > High Availability > Configuration and:

    1. Configure the following:

      GUI item

      Description

      State

      Enable or disable HA.

      Type

      Select Member. See About HA types.

      HA mode

      Select either Active-Active or Active-Passive. See About HA modes.

      Action on failure

      Select what the primary unit will do after it fails (if it can recover), either:

      • Switch off immediately — Do not automatically rejoin the HA cluster. To manually rejoin it to the cluster with its configured Member role, click Restore.
      • Wait for recovery — Automatically rejoin the cluster, but the Effective becomes Secondary. To manually restore the FortiMail unit to acting in its configured Member role, click Restore.
      • Wait for recovery and switch to configured role — Automatically rejoin the cluster, but the Effective becomes Primary again.The secondary unit that was temporarily acting as primary also automatically becomes Secondary again. This option may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is recurring, resulting in many extra role changes.

      Tip: In most cases, you should select Wait for recovery.

      Shared password

      Enter a password for this HA cluster.

      Before FortiMail units in the HA cluster synchronize with each other, they verify that they have the same password. This prevents them from accidentally synchronizing with the wrong cluster. Therefore you must enter the same HA password on all of them.

    2. Expand the Member section. For each FortiMail unit in the HA cluster, click New and configure the following:

      GUI item

      Description

      Name

      Enter the name of this unit in the HA cluster.

      Member role

      Select the role of the FortiMail unit in the HA cluster, either Primary or Secondary.

      Each FortiMail unit's role in the HA cluster is not synchronized because this distinguishes the primary and secondary units.

      Effects of the role vary by HA mode. See About HA modes.

      Use current device

      Click to automatically fill out the following fields with the current device information.

      IPv4 address
      (or IPv6 address)

      Enter the IP address of the network interface that will listen for the heartbeat and synchronization.

      Alternatively, to define a heartbeat interface, instead use Host name.

      If you want more heartbeat interfaces, click + and then add those IP addresses.

      Alternatively, if you are currently configuring the device that you are adding to the table, click Use Current Device.

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the IP address is configured here, then HA will detect that the heartbeat link has failed.

      Host name

      Enter the hostname of the network interface that will listen for the heartbeat and synchronization.

      Alternatively, to define a heartbeat interface, instead use IPv4 address (or IPv6 address).

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the hostname is configured here, then HA will detect that the heartbeat link has failed.

      Tip: Use a hostname to define the heartbeat interface (not an IP address) in environments where IP addresses change often, such as with VMs and containers.

      Heartbeat hostnames might not be the same as the SMTP relay/proxy hostname (Host name in mail settings) and virtual hostname for active-passive HA (Virtual hostname). If it is, however, then you can click Use Current Device to automatically paste the MTA hostname into this field.

      Primary backup

      If HA mode is Active-Active, then there can be many secondary units. Enable this setting if Member role is Secondary, and you want to select this member to become the new primary when a failure is detected.

      Note: Usually you should have a primary backup. Otherwise configuration synchronization will be interrupted upon failure. See About HA heartbeat and synchronization.

      Comment

      Optional. Enter a descriptive comment.

    3. If the HA mode is active-passive, configure the Virtual IP address (or Virtual IPv6 address) that will transfer upon failover.
    4. If the HA cluster stores mail data on NAS, disable Synchronize mail data directory.

    5. Optionally, configure:

    6. Click Apply.
  8. Repeat the previous steps for secondary units.

    Except for Shared password, Member role, and the IP address or hostname of the primary that the secondary is connecting to, skip most settings.

  9. If the HA mode is active-active, configure the load balancer with either remote service monitoring or interface monitoring to detect failed FortiMail units, and to redirect and balance connections among available FortiMail units.

  10. Monitor the status of each cluster member. For details, see Monitoring HA status, Logs, reports, and alerts, and Centrally monitoring the HA cluster.

See also

About HA types

About HA modes

About HA heartbeat and synchronization

Settings that are not synchronized by HA

Deploying group HA

The following procedures describe how to set up group HA with multiple FortiMail unit clusters.

  1. Register all of the FortiMail units as described in Deploying member HA.

  2. Connect the network interfaces as described in Deploying member HA.

  3. On the primary cluster's primary unit:

    1. Go to System > High Availability > Configuration.

    2. Configure the HA settings as described in Deploying member HA.

    3. From Type, select Group.

      The Group section becomes available.

      HA mode and Action on failure now apply across the group of clusters (not to this primary unit's individual cluster). If required, reconfigure those settings.

    4. Expand the Group section.

    5. Click New. Configure the following settings, and then click OK and Apply.

      Repeat this step for each primary and secondary cluster in the group of clusters.

      GUI item

      Description

      Name

      Enter the name of the HA cluster.

      Group role

      Select the cluster's role, either Off, Primary, or Secondary.

      Member mode

      Select the group of clusters' mode, ether Active-Active (A-A) or Active-Passive (A-P).

      Comment

      Optional. Enter a description or comment.

  4. On other units:

    1. Go to System > High Availability > Configuration.

    2. If you recently changed HA settings on the primary cluster's primary unit, then click the Refresh icons next to the Group and Member sections to get the current entries.
    3. Click Join an existing HA cluster.

    4. Configure the following settings:

      GUI item

      Description

      Primary device IP

      Enter the IP address of the primary cluster's primary unit.

      Shared password

      Enter the Shared password that was configured on the primary unit.

      Join with name

      Enter the unit's name in the cluster.

      Join HA group

      Enable this option.

      Group name

      Select which cluster to join.

    5. Click Confirm and Join.

      This option is only available if HA is not already configured on the unit. If HA has been configured before, cancel and go to the next step instead.

  5. If member HA is already configured on the unit, and you want it to join group HA:

    1. Go to System > High Availability > Configuration.

    2. From Type, select Group.

    3. Expand the Member section.

    4. Double-click to edit the member.

    5. Configure the following:

      GUI item

      Description

      Group name

      Select which HA group to join.

      This setting is available only if Type is Group.

    6. Click OK, and then click Apply.

      The primary unit in the primary HA cluster will collect and populate the HA information on other primary units in the secondary HA clusters, which will then propagate the information to their secondary units.

See also

Deploying member HA

About HA types

About HA modes

Advanced Option section

  1. Go to System > High Availability > Configuration.

  2. Expand the Advanced Option section.

  3. Configure the following and then click Apply:

    GUI item

    Description

    Synchronize mail data directory

    Enable if the HA cluster does not store its mail data on a NAS server, and you need to use HA communications to synchronize its system quarantine, per-recipient quarantines, email archives, email users’ preferences, and (server mode only) mailboxes.

    This setting applies only if HA mode is Active-Passive.

    Note

    You can manually initiate a data synchronization whenever significant changes occur. See Start configuration sync.

    Synchronize MTA queue directory

    Enable if you want to synchronize the mail queue with FortiMail units in the HA cluster.

    This setting applies only if HA mode is Active-Passive.

    Note

    If the primary unit experiences a hard drive failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note

    If you enable this option, it can reduce performance, and is not guaranteed to prevent data loss. Mail queue directories are very dynamic. Many email could be added to the queue between each sync.

    If you disable this option, data loss might not occur, either. After a failover, when the unit rejoins the cluster, a separate synchronization mechanism occurs. This often restores the mail queue. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queues.

    HA base port

    Enter the first of multiple port numbers (see Appendix C: Port Numbers) that will be used for:

    • heartbeat signals
    • synchronization control
    • data synchronization
    • configuration synchronization
    Note

    In addition to a lost heartbeat, other unresponsive network services and hardware failure can also be used to trigger failover. For details, see Service Monitor section and About HA heartbeat and synchronization.

    Note

    In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Start configuration sync.

    Heartbeat lost threshold

    Enter the amount of time, in seconds, that a primary unit can be unresponsive until HA detects a failure and performs the action in Action on failure.

    Note

    To determine the best heartbeat threshold, monitor your FortiMail unit's performance. Examine how long each high system resource usage lasts. Configure a threshold that is longer than most peak usage. This gives the secondary unit enough time to accurately confirm unresponsiveness, and avoid unnecessary failovers. (Heartbeat responses may be slow during peak load.) See also Using the dashboard, Centrally monitoring the HA cluster, and Troubleshoot resource issues.

    Note

    If you have service level agreements (SLA), then you may be required to keep this time short. If the failure detection time is too long, email delivery could be delayed or fail until HA detects the failure. This reduces service uptime.

    Remote services as heartbeat

    Enable to avoid the Action on failure action if the heartbeat links (see Interface section) temporarily fail, but service monitoring such as for SMTP (see Service Monitor section) detects that the primary unit is still available.

    Note

    The Action on failure action can still occur if the HA process restarts due to system reboot or HA daemon restart. Then it examines the physical heartbeat links first. If they are not found, then failure is detected.

    This setting provides an extra HA heartbeat only, not synchronization. To avoid synchronization problems, do not use remote service monitoring as a heartbeat for a long time. This feature is intended only as a temporary heartbeat until you reestablish a normal primary or secondary heartbeat link.

Interface section

This section configures the HA behavior of network interfaces on this FortiMail unit, especially whether they have a:

In a basic HA deployment, the heartbeat interface provides a basic signal to other HA group members about the health of the primary FortiMail unit. However, you can use an additional signals. Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in Action on failure. This can include reconfiguring network interfaces to move virtual IP addresses onto the new primary unit.Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in Action on failure. This can include reconfiguring network interfaces to move Virtual IP address (or Virtual IPv6 address) and Virtual hostname onto the new primary unit.

  1. Configure the interface monitoring interval and failure detection threshold. See Service Monitor section.
  2. Go to System > High Availability > Configuration.

  3. Expand the Interface section.

  4. Select a row for a network interface in the table, and then click Edit.

  5. Configure the following settings:

    GUI item

    Description

    Heartbeat status

    Enable if this interface will listen for HA heartbeat and synchronization communications.

    Note

    You must enable this option on at least one of the network interfaces that you defined for the unit in IPv4 address (or IPv6 address). Otherwise HA will detect a failure.

    Port

    Displays the name of the network interface that you are configuring.

    Optionally, you can click the name to view or configure its settings. See also Configuring the network interfaces.

    Virtual IP address (or Virtual IPv6 address)

    Enter a virtual IP address and netmask that the primary unit will have on this network interface. Upon failure detection, the secondary will become the new primary and start to use the virtual IP address.

    For gateway mode and server mode, DNS records should be configured to point to the virtual IP address, not the physical IP addresses.See also About HA modes, Configuring the network interfaces, and About IPv6 Support.

    This setting is available only if HA mode is Active-Passive.

    Virtual hostname

    Enter a virtual hostname.

    Similar to behavior with the virtual IP address, the virtual hostname belongs to the current primary unit. Upon failover, the secondary unit becomes the new primary unit, and so it starts to use the virtual hostname instead.

    This setting is available only if HA mode is Active-Passive.

    Enable port monitor

    Enable to monitor the network interface for failure. Connection interval and retries occur according to the interface monitoring settings in Service Monitor section.

Service Monitor section

Failed FortiMail units, in the simplest HA deployments, are detected by an interrupted heartbeat. However HA can also detect failure of hardware and network services. Heartbeats detect the general responsiveness of a primary unit, but do not test each daemon (for example, POP3 or webmail service), hard drive, and physical network ports used by non-heartbeat traffic. Therefore you can add hardware and service monitoring to be more specific. Alternatively, if the heartbeat link is briefly disconnected, services monitoring can prevent an unnecessary failover by temporarily acting as a secondary heartbeat.

With service monitoring, the secondary unit connects to the SMTP, POP3, and/or web service (HTTP) on the primary unit to detect failure. For server mode, IMAP service can also be monitored.

With local network interface monitoring and hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.Hard drive monitoring tests that the local hard drive is still accessible, and disk space exists for mail data. If the hard disk is not responsive, or if the mail data disk is 95% full, then a failure is detected.

Network interface monitoring tests all network interfaces where:

Alert email, log messages, and SNMP traps (if configured) indicate the specific cause.

To configure hardware and service monitoring

  1. Go to System > High Availability > Configuration.

  2. Expand the Service Monitor section.

  3. Select a row in the table and click Edit.

    For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring for the service.

    Name

    Displays the service name.

    Port

    Enter the listening port number of the service on the primary unit and (active-active HA only) secondary. See also Appendix C: Port Numbers and Mail access.

    Timeout

    Enter the amount of time in seconds to wait for a response when service monitoring tries to connect.

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

    For interface monitoring, configure the following and click OK (to specify which ports are monitored, see Interface section):

    GUI item

    Description

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

    For local hard drive monitoring, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring of the local hard drive.

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

See also

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Monitoring HA status

After you configure HA (see Configuring HA), to view the current roles and synchronization status of the HA group, go System > High Availability > Status. You can also manually initiate some HA actions, such as Sync and Failover.

Most information is automatically populated after the primary unit connects to this unit, and that unit joins the HA cluster. Then HA statuses such as Status are kept up-to-date via the heartbeat.

GUI item

Description

Type

Displays the configured Type.

Mode

Displays the configured HA mode.

Refresh

(button)

Click to get the newest data and display it on System > High Availability > Status.

If the display does not refresh, you may need to click Clear Cache first.

Failover

(button)

Select a FortiMail unit or cluster, and then click this button to manually trigger a failover.

Restore

(button)

Select a FortiMail unit or cluster, and then click to manually restart HA and reset Effective to match the unit's initially configured Member role.

Caution: When a failed unit reboots, don't click Restore until it finishes synchronizing its mail queue and other data with the current primary. If this recovery mechanism is interrupted, data could be lost. For details, see Status and Synchronize mail data directory.

Sync

(button)

Select a FortiMail unit or cluster, and then click to manually initiate configuration synchronization with other FortiMail units in the HA cluster or group of clusters. See also Settings that are not synchronized by HA.

Clear Cache

(button)

Click to reload the heartbeat daemon and its status data to show current information.

Name

Name of the unit and, if there are multiple HA clusters, the Group name.

SN

Serial number.

IP

IPv4 address (or IPv6 address).

Version

Firmware version. A FortiMail unit must run the same firmware version in order to join the HA group, so that the configuration can be synchronized. Exceptions are during updates. See Upgrading firmware on HA units.

Configured

See Combinations of configured and effective HA role.

In active-active HA, the secondary unit that is the Primary backup (if configured) will display Secondary, like other secondary units.

Effective

See Combinations of configured and effective HA role.

After a failure has been detected, this status may not match the initially configured Member role. To return to that role, click Restore.

Status

Displays the status of HA cluster joining, heartbeat, and synchronization. See also Combinations of configured and effective HA role.

  • Running — Normal HA operation. Recent synchronization was successful.

  • Starting — HA processes are starting. This briefly occurs when you enable HA, before the unit joins a cluster.

  • Restarting — The unit is rebooting. Other units in the HA cluster will wait a little longer than the usual HA heartbeat interval in order to allow the reboot to complete without triggering a failover.

  • Paused — HA synchronization has been either manually (see the FortiMail CLI Reference) or automatically paused (while the other unit is Restarting). Normal on other units while you upgrade the primary unit.

  • Stopping — HA processes are shutting down.

  • Unseen — Heartbeats have not yet been detected for this unit, and therefore the unit is not yet joined to the HA cluster. Verify that the unit is powered on, HA is enabled, and the heartbeat interfaces are reachable by other units in the cluster.

  • Vanished — Heartbeat was detected for this unit but then disappeared. Verify that the unit is powered on, HA is enabled, and the heartbeat interfaces are reachable by other units in the cluster.

  • Build Mismatch — Other units in the HA cluster do not have the same firmware version. Configuration synchronization requires that all units in the cluster have the same firmware, since different versions may support different features. Normal only while you upgrade firmware in the cluster.

  • Bad Config — HA processes could not start because the configuration was not valid. Verify the HA settings such as the shared password and heartbeat interfaces.

  • Checking — Running a manually requested configuration checksum verification. If there are errors, it starts a configuration sync. Normal while you run the CLI command diag sys ha sync-status.
  • Config Failed — Unrepairable configuration mismatch was detected. Verify the settings that are not in HA.
  • Failed — HA processes have an internal error. If this occurs again, contact Fortinet Technical Support.
  • Asking for Snapshot — The unit is requesting a configuration checksum from the other units in the cluster or group of clusters. If the checksum is different, then the configurations are out-of-sync. Units in the HA cluster must download the new configuration file.

  • Getting Snapshot — The unit is downloading a configuration snapshot.

  • Resyncing — The unit is trying to synchronize its configuration with other units in the cluster. Usually this occurs after downloading a configuration snapshot, or if it detects a checksum error.

  • Synchronized — Configuration is synchronized.

Up Time

Amount of time that the HA cluster member has been operational.

Last Seen

When this FortiMail unit’s HA daemon last communicated with the others in the HA group to make sure that they are available. See also Heartbeat lost threshold and HA base port.

See also

Centrally monitoring the HA cluster

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Configuring HA

Service Monitor section

Combinations of configured and effective HA role

To adapt when it detects a heartbeat or synchronization failure, a FortiMail HA unit may no longer be operating in its initially configured Member role.

Combinations of the Configured and Effective columns on System > High Availability > Status indicate if the unit joined the HA cluster and it is operating normally or not. The Status column may indicate troubleshooting information.

Configured

Effective

Result

Primary

Primary

Normal for the primary.

Secondary

Secondary

Normal for the secondary.

In active-active HA, however, this can also happen if the primary has failed. (Most of the secondaries continue to show Secondary. Only the unit where you enabled Primary backup has Effective showing Primary.)

Primary
or Secondary

Discovering

Initial HA configuration is complete. The primary is now trying to connect with other HA units to form a heartbeat link.

Primary
or Secondary

Registering

Heartbeat connection succeeded and the unit is joining the cluster.

Primary
or Secondary

Unknown

Initial HA configuration was not able to complete. Therefore the unit could not try to join an HA cluster or group. For example, if the primary is defined, but not the other units, then HA cannot form a heartbeat link yet. This situation should correct itself once all units are configured.

Primary
or Secondary

Off

Either the:

  • heartbeat has failed, and Action on failure is Switch off immediately
  • HA process is starting

and the heartbeat and configuration synchronization are currently stopped.

After the secondary joins an HA cluster or group, some causes such as network interruptions could cause the first configuration synchronization to fail. To prevent both the secondary and primary from simultaneously acting as primary ("split brain"), Effective temporarily becomes Off. If the next synchronization fails again, then the secondary's Effective becomes Primary.

To restart HA processes and return the unit to the originally configured role, click Restore.

Primary
or Secondary

Hold Off

The primary is rebooting or upgrading firmware. It asked to wait longer than the usual Heartbeat lost threshold so that the reboot can complete. If the primary does not return, then the secondary performs the action in Action on failure or Primary backup.

Primary

Failed

Remote service monitoring, or local hard drive, or network interface monitoring has detected a failure. If operating in transparent mode, then on System > Network > Interface, the network interface IP/Netmask on the secondary displays Bridging (waiting for recovery).

When you correct the failure, Effective changes to either Secondary or Primary, depending on Action on failure.

Primary

Secondary

The primary failed. A secondary automatically became the new primary. When the failed unit restarted, it detected that there was already a primary in the HA cluster or group, and so now the failed unit is the new secondary.

If you want the failed unit to return to acting as the primary, click Restore.

Secondary

Primary

The secondary detected that the primary failed, and then the secondary became the new primary.

If you want it to return to acting as the secondary,click Restore.

Secondary

Secondary (No Primary)

The secondary detected that the primary failed, but it was not configured as Primary backup. Therefore configuration synchronization cannot occur until you either repair the primary, or manually configure a secondary to become the new primary.

This occurs only if HA mode is Active-active.

See also

About HA heartbeat and synchronization

Monitoring HA status

Configuring HA

Service Monitor section

Using high availability (HA)

Using high availability (HA)

FortiMail units can be configured to act in a high availability (HA) cluster or group of clusters to increase processing capacity and/or availability, so that your overall deployment uptime is preserved even if some individual hardware or software fails. Deployments may require changes to the network topology or DNS records to achieve this, depending on the HA mode.

This section contains the following topics:

About HA types

Supported FortiMail HA deployment types are:

  • Member HA: Multiple FortiMail units work together in one HA pair or cluster.
  • Group HA: Multiple HA clusters work together in a group.

For example, if you have one data center to protect, you only need one cluster. However if you have multiple data centers for redundancy or capacity, then you can join the clusters together to form an HA group.

Each cluster in an HA group has its own HA mode. At the HA group level, there is also an HA mode that defines throughput or failover amongst the clusters. Depending on your throughput or failover requirements, you can mix the HA modes at the member HA and group HA level.

See also

About HA modes

Deploying member HA

Deploying group HA

About HA modes

FortiMail HA clusters and groups of clusters can operate with HA mode as either:

  • active-passive
  • active-active

This determines network topology, fault tolerance, and total throughput.

Active-passive HA

Active-active HA

2 FortiMail units or clusters

2-24 FortiMail units or clusters

Deployed behind a switch

Deployed either behind a load balancer or with multiple DNS MX records to distribute connections among units or clusters (usually in larger organizations with email server farms)

Both configuration* and data synchronized^

Only configuration* synchronized

Only primary unit or HA group processes email

All units process email

No data loss^ when hardware fails

Data loss when hardware fails

No increased processing capacity

Increased processing capacity

* For exceptions, see Settings that are not synchronized by HA.

^ For exceptions, see Synchronization of MTA queue directories after a failover.

Active-passive member HA operating in gateway mode

only primary unit processes email in active-passive HA

Active-active member HA operating in gateway mode

all available units process email in active-active HA

Group HA with a mix of active-active and active-passive clusters

When a FortiMail unit or cluster fails, its email traffic is interrupted. SMTP clients usually handle this gracefully, and restart a new connection. Traffic is redirected away from the point of failure by different methods that vary by HA mode:

  • Active-passive: A secondary becomes primary and starts using the using the Virtual IP address (or Virtual IPv6 address) and/or Virtual hostname. Then it uses ARP to notify the nearby OSI Layer 2 switch or router about the link change, and they automatically redirect traffic.

  • Active-active: The load balancer health check detects a failure. It stops distributing traffic to failed FortiMail units or clusters. Only live FortiMail units continue to receive traffic.

Traffic for other IP addresses, such as administrative connections on the management network interfaces, may continue to reach the failed unit or cluster, depending on your network topology. This can be used to troubleshoot the cause of failure, or to reconfigure HA.

See also

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Storing mail data from HA clusters on a NAS server

Mixing HA modes

Mix the cluster's HA mode and the group's HA mode, if needed for your use case. For example, to increase service uptime and reduce data loss risks, you could join active-passive HA clusters together into an active-passive group. This reduces risk of data loss a little more than a standalone active-passive cluster. However it also reduces throughput to a fraction of what is possible, because only the primary unit of the primary cluster (1/4 of your total FortiMail units) will be actively processing email. Instead you may prefer to mix HA modes for a balance of availability and performance: active-passive clusters, in an active-active group.

Synchronization varies by HA mode. Therefore a mix of modes can change which settings are synchronized.

About HA heartbeat and synchronization

Heartbeat network interfaces:

  • monitor for failure of the primary unit in the HA cluster or group of clusters (a health check)

  • synchronize configuration changes from the primary unit to secondaries (and in groups of clusters, from the primary cluster to the secondaries)

    For exceptions, see Settings that are not synchronized by HA.

  • (active-passive only, and only if enabled) synchronize the mail queue, FortiMail system mail directory, and user home directories

    For exceptions, see Storing mail data from HA clusters on a NAS server.

Note

Synchronization intervals vary.

  • FortiGuard Antispam and FortiGuard Antivirus packages: Not synchronized.
  • Mail queue: Up to 20 minutes (not real time).
  • Configuration: Real time.

If configuration synchronization did not occur when expected, or if you have inadvertently de-synchronized the secondary unit’s configuration (for example, if a cable was accidentally disconnected), then you can manually initiate synchronization on either the primary unit or the secondary unit.

Periodically, secondary units verify configuration synchronization. If it's not in sync, then secondary units pull changes from the primary, and reload the configuration. In active-active HA, block list and safe list changes are also pushed from the secondary to the primary unit, and then synchronized to all other secondary units.

Note

Due to the introduction of primary backup in active-active HA in FortiMail 7.4.0, communication between the secondary units is also required. In config-only HA before the 7.4.0 release, it was not required.

Heartbeats from the primary to secondaries must not be interrupted. Exceptions include when:

  • the primary reboots, or you enter the execute reload command in the CLI. If the primary unit reboots or reloads its configuration, then it signals to the secondary unit to wait for additional time.
  • Remote services as heartbeat is enabled.
Note

Remote service monitoring does not provide synchronization, and therefore is not a complete, long-term replacement for the heartbeat. Heartbeat links that are disrupted should be fixed as soon as possible.

If the heartbeat signal is lost, or if the primary detects failure via another method (hard drive monitoring, interface monitoring, or remote service monitoring), then behavior varies by HA mode:

  • Active-passive: A secondary unit or cluster becomes the new primary, and starts processing email ("failover").

  • Active-active: If Primary backup has been selected, then your preferred secondary unit or cluster will take over the role of the primary (Effective becomes Primary) for the purpose of configuration synchronization. All units continue to process email, except the failed unit.

    If a Primary backup is not selected, or if the new primary also fails, then each secondary continues as a secondary. However, with no designated primary unit, changes to the configuration are not synchronized anymore. Repair the primary or reconfigure the HA cluster to form a heartbeat with a new primary.

Some failure causes are temporary. Depending on Action on failure, even if the primary can recover, it might not automatically return to the HA cluster or group of clusters and reclaim its role. You can manually trigger this with Restore.

See also

Service Monitor section

About HA types

About HA modes

About HA port numbers and protocols

About logging, alert email, and SNMP for HA

Settings that are not synchronized by HA

Storing mail data from HA clusters on a NAS server

Synchronization of MTA queue directories after a failover

About HA port numbers and protocols

The default protocol and port numbers for HA heartbeat, synchronization, and service monitoring communications are configurable. See HA base port, the control packet setting in the FortiMail CLI Reference, and Appendix C: Port Numbers.

Note

If a firewall is between the primary and secondary FortiMail unit, then verify that the firewall policy allows HA port numbers. Blocked HA ports can cause incorrect failover and synchronization failure.

Settings that are not synchronized by HA

All settings on the primary unit are synchronized to the secondary unit, except the following:

Settings

Explanation

Licenses

FortiGuard Antivirus, FortiGuard Antispam, and other service subscription and feature licenses are specific to each FortiMail unit, and are not synchronized, regardless of HA mode.

Operation mode

You must set the operation mode (gateway, transparent, or server) of each FortiMail unit before they join HA. Many settings vary by operation mode, and therefore configurations cannot be synchronized if the operation mode is different.

Host name

Different host names are used to distinguish members of the HA cluster when connecting to the GUI and to indicate which unit failed. For details, see Host name.

Static route

Static routes are not synchronized because some or all in the network interfaces on each FortiMail unit in the HA cluster may be connected to different subnets. See also Configuring static routes .

Interface configuration

(gateway and server mode only)

Administrator connections to the GUI/CLI, alert email, and many other features require that you configure at least one network interface with an IP address. For details, see Configuring the network interfaces.

Exceptions include virtual IP addresses on active-passive HA. Virtual IP addresses are synchronized because, upon failover, the secondary unit must starts to use them. This mechanism allows traffic to receive connections instead of the failed primary unit. See Virtual IP address (or Virtual IPv6 address).

Management IP address

(transparent mode only)

Each FortiMail unit in the HA cluster should be configured with different management IP addresses for GUI and CLI connectivity purposes. For details, see About the management IP.

SNMP system information

Each FortiMail unit in the HA cluster will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring SNMP queries and traps.

RAID configuration

RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

Some HA settings

Product name and icon

The product name and icon under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

Miscellaneous settings
(active-active HA only)

In active-active HA, the following settings are not synchronized:

All system, domain, and user level block/safe lists are synchronized.

Note

User data is synchronized at predefined time intervals, not in real time.

See also

About HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

During normal operation in active-passive HA, email is either:

  • being received or sent by the primary FortiMail unit or cluster
  • waiting to be delivered in the mail queue
  • stored in the primary’s mail data directories (quarantines, email archives, and, for server mode, email inboxes)

When a failure occurs, sending and receiving is interrupted. The delivery attempt fails. Usually, the sender retries. However, stored email remains in the mail data directories.

To prevent data loss when a primary fails, you usually should enable Synchronize mail data directory (unless NAS storage is used), but do not need to enable Synchronize MTA queue directory. This is because of an automatic recovery mechanism in FortiMail HA failover.

  1. The secondary unit detects that the primary unit has failed, and becomes the new primary.

  2. If the failed unit can reboot, it detects the new primary unit.

  3. The former primary unit pushes its mail queue to the new primary unit.

    This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

  4. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

As a result, if the failed primary unit can restart, no email is lost from the mail queue.

See also

About HA heartbeat and synchronization

Storing mail data from HA clusters on a NAS server

Storing mail data from HA clusters on a NAS server

In active-active HA, if FortiMail units are operating in server mode, you must store mail data centrally on a network attached storage (NAS) server — not on each FortiMail unit. Otherwise users’ email and other data could be scattered across multiple FortiMail units, and it won't be available when they connect to another.

For other HA and operating modes, it also may be better to store mail data on a NAS server.

For example, regular NAS server backups help to prevent mail data loss, even if a FortiMail unit has hardware failure. Also, during a temporary failure of a FortiMail unit, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

For active-active HA with a NAS server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email.

For active-passive HA, the primary unit stores all mail data on the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit also uses the same NAS server, and continues operating with no loss of mail data.

Note

If FortiMail units are in active-passive HA, and store mail data on a remote NAS server, disable Synchronize mail data directory to reduce redundant network traffic and save bandwidth.

For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

See also

About HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

About logging, alert email, and SNMP for HA

For faster discovery and diagnosis of network problems that have caused an HA failover, you can configure SNMP, Syslog, and/or alert email to monitor FortiMail HA.

To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to secondary units, all FortiMail units in the HA cluster or group of clusters record their own separate log messages and send separate alert email messages. Log data is not synchronized.

Note

To distinguish alert email from each FortiMail unit in HA, configure a different host name for each. For details, see Host name.

To use SNMP to monitor HA failover, configure each cluster member to enable HA events for the SNMP community, such as:

See also

Configuring SNMP queries and traps

Logs, reports, and alerts

About HA heartbeat and synchronization

Configuring HA

Depending on your HA deployment scenario, use the following procedures to deploy either the member or group type of HA.

After you configure HA, usually administrators connect only to the primary unit. Changes made to the primary unit are synchronized to the secondary units. See About HA heartbeat and synchronization.

Exceptions include:

Deploying member HA

The following procedures describe how to set up a FortiMail pair or cluster in the member HA type.

  1. Register all FortiMail units in the HA cluster with the Fortinet Technical Support web site:

    https://support.fortinet.com/

    If you use licensed features such as centralized HA monitoring, FortiGuard Antivirus, and/or FortiGuard Antispam, you must purchase and register licenses for each unit.

    Note

    You can mix different models in FortiMail HA. However:

  2. Design a network topology that avoids a single point of failure.

    For example, if there is only one router or firewall or ISP link, and it fails, then service downtime will occur even if the FortiMail HA cluster and email servers are still operating normally. To avoid this risk, if possible, all devices and links should be redundant. (Connect each FortiMail unit to two gateway routers, etc. You may need more network cables and devices to achieve this.)

  3. Connect the network interfaces that will be used for HA heartbeat and synchronization. At least one heartbeat link is required.

    For example, you could use a network cable to directly connect FortiMail A's port2 to FortiMail B's port2.

    Caution

    To minimize failovers and sync disruptions, create two heartbeat links. Either:

    • Directly link each pair of heartbeat ports with an Ethernet crossover cable.
    • Connect each pair through an isolated, dedicated local switch.

    This guarantees bandwidth and lower latency for the synchronization and heartbeat, even if one cable is accidentally disconnected. For better reliability, also enable Remote services as heartbeat.

    Note

    Don't use DHCP IP addresses for heartbeat links. DHCP can be the default or common for VM instances in cloud deployments, but DHCP can disrupt the HA heartbeat link when an IP address has not been assigned yet by the DHCP server, such as:

    • during firmware upgrades

    • if DHCP clients have an IP address conflict

    • if DHCP reservations fail

    Use static IP addresses instead.

    Don't disconnect heartbeat links once HA is enabled. If the heartbeat is interrupted, then the secondary will assume that the primary has failed, and become the new primary. If no failure has actually occurred, however, both FortiMail units will be operating as primary units at the same time (a "split brain"). This disrupts synchronization and could cause scattered data. In active-passive HA, it also can cause an IP address conflicts. To correct the role on a unit that should be secondary, click Restore.

  4. If you will use active-passive HA with gateway or server operation mode, add a Virtual IP address (or Virtual IPv6 address) and Virtual hostname to the network interface on the primary unit that receives email connections.

    On internal DNS servers, update records to use this virtual IP address, not the physical IP address.

    On public DNS servers, records should still use the public IP address. If your router or firewall applies NAT, this IP address may be on their WAN or gateway interface, not the virtual IP address on FortiMail.

    Wait for the DNS records to propagate to non-authoritative DNS servers before you enable HA. This prevents service disruptions.

    Topology with virtual IP address for active-passive HA

    virtual IP address transfers to secondary upon failure of primary FortiMail in active-passive HA cluster

  5. If you will use active-active HA, configure storage of mail data on a NAS server. See Storing mail data from HA clusters on a NAS server.(Active-passive HA can also benefit from a NAS server, but does not require it.)

    Caution

    For active-active HA with server mode, you must store mail data externally on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units.

  6. If you will use remote service monitoring (SMTP etc.), then enable those services on the heartbeat network interfaces. See Mail access.

  7. On the FortiMail unit that will be the primary in the HA cluster, go to System > High Availability > Configuration and:

    1. Configure the following:

      GUI item

      Description

      State

      Enable or disable HA.

      Type

      Select Member. See About HA types.

      HA mode

      Select either Active-Active or Active-Passive. See About HA modes.

      Action on failure

      Select what the primary unit will do after it fails (if it can recover), either:

      • Switch off immediately — Do not automatically rejoin the HA cluster. To manually rejoin it to the cluster with its configured Member role, click Restore.
      • Wait for recovery — Automatically rejoin the cluster, but the Effective becomes Secondary. To manually restore the FortiMail unit to acting in its configured Member role, click Restore.
      • Wait for recovery and switch to configured role — Automatically rejoin the cluster, but the Effective becomes Primary again.The secondary unit that was temporarily acting as primary also automatically becomes Secondary again. This option may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is recurring, resulting in many extra role changes.

      Tip: In most cases, you should select Wait for recovery.

      Shared password

      Enter a password for this HA cluster.

      Before FortiMail units in the HA cluster synchronize with each other, they verify that they have the same password. This prevents them from accidentally synchronizing with the wrong cluster. Therefore you must enter the same HA password on all of them.

    2. Expand the Member section. For each FortiMail unit in the HA cluster, click New and configure the following:

      GUI item

      Description

      Name

      Enter the name of this unit in the HA cluster.

      Member role

      Select the role of the FortiMail unit in the HA cluster, either Primary or Secondary.

      Each FortiMail unit's role in the HA cluster is not synchronized because this distinguishes the primary and secondary units.

      Effects of the role vary by HA mode. See About HA modes.

      Use current device

      Click to automatically fill out the following fields with the current device information.

      IPv4 address
      (or IPv6 address)

      Enter the IP address of the network interface that will listen for the heartbeat and synchronization.

      Alternatively, to define a heartbeat interface, instead use Host name.

      If you want more heartbeat interfaces, click + and then add those IP addresses.

      Alternatively, if you are currently configuring the device that you are adding to the table, click Use Current Device.

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the IP address is configured here, then HA will detect that the heartbeat link has failed.

      Host name

      Enter the hostname of the network interface that will listen for the heartbeat and synchronization.

      Alternatively, to define a heartbeat interface, instead use IPv4 address (or IPv6 address).

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the hostname is configured here, then HA will detect that the heartbeat link has failed.

      Tip: Use a hostname to define the heartbeat interface (not an IP address) in environments where IP addresses change often, such as with VMs and containers.

      Heartbeat hostnames might not be the same as the SMTP relay/proxy hostname (Host name in mail settings) and virtual hostname for active-passive HA (Virtual hostname). If it is, however, then you can click Use Current Device to automatically paste the MTA hostname into this field.

      Primary backup

      If HA mode is Active-Active, then there can be many secondary units. Enable this setting if Member role is Secondary, and you want to select this member to become the new primary when a failure is detected.

      Note: Usually you should have a primary backup. Otherwise configuration synchronization will be interrupted upon failure. See About HA heartbeat and synchronization.

      Comment

      Optional. Enter a descriptive comment.

    3. If the HA mode is active-passive, configure the Virtual IP address (or Virtual IPv6 address) that will transfer upon failover.
    4. If the HA cluster stores mail data on NAS, disable Synchronize mail data directory.

    5. Optionally, configure:

    6. Click Apply.
  8. Repeat the previous steps for secondary units.

    Except for Shared password, Member role, and the IP address or hostname of the primary that the secondary is connecting to, skip most settings.

  9. If the HA mode is active-active, configure the load balancer with either remote service monitoring or interface monitoring to detect failed FortiMail units, and to redirect and balance connections among available FortiMail units.

  10. Monitor the status of each cluster member. For details, see Monitoring HA status, Logs, reports, and alerts, and Centrally monitoring the HA cluster.

See also

About HA types

About HA modes

About HA heartbeat and synchronization

Settings that are not synchronized by HA

Deploying group HA

The following procedures describe how to set up group HA with multiple FortiMail unit clusters.

  1. Register all of the FortiMail units as described in Deploying member HA.

  2. Connect the network interfaces as described in Deploying member HA.

  3. On the primary cluster's primary unit:

    1. Go to System > High Availability > Configuration.

    2. Configure the HA settings as described in Deploying member HA.

    3. From Type, select Group.

      The Group section becomes available.

      HA mode and Action on failure now apply across the group of clusters (not to this primary unit's individual cluster). If required, reconfigure those settings.

    4. Expand the Group section.

    5. Click New. Configure the following settings, and then click OK and Apply.

      Repeat this step for each primary and secondary cluster in the group of clusters.

      GUI item

      Description

      Name

      Enter the name of the HA cluster.

      Group role

      Select the cluster's role, either Off, Primary, or Secondary.

      Member mode

      Select the group of clusters' mode, ether Active-Active (A-A) or Active-Passive (A-P).

      Comment

      Optional. Enter a description or comment.

  4. On other units:

    1. Go to System > High Availability > Configuration.

    2. If you recently changed HA settings on the primary cluster's primary unit, then click the Refresh icons next to the Group and Member sections to get the current entries.
    3. Click Join an existing HA cluster.

    4. Configure the following settings:

      GUI item

      Description

      Primary device IP

      Enter the IP address of the primary cluster's primary unit.

      Shared password

      Enter the Shared password that was configured on the primary unit.

      Join with name

      Enter the unit's name in the cluster.

      Join HA group

      Enable this option.

      Group name

      Select which cluster to join.

    5. Click Confirm and Join.

      This option is only available if HA is not already configured on the unit. If HA has been configured before, cancel and go to the next step instead.

  5. If member HA is already configured on the unit, and you want it to join group HA:

    1. Go to System > High Availability > Configuration.

    2. From Type, select Group.

    3. Expand the Member section.

    4. Double-click to edit the member.

    5. Configure the following:

      GUI item

      Description

      Group name

      Select which HA group to join.

      This setting is available only if Type is Group.

    6. Click OK, and then click Apply.

      The primary unit in the primary HA cluster will collect and populate the HA information on other primary units in the secondary HA clusters, which will then propagate the information to their secondary units.

See also

Deploying member HA

About HA types

About HA modes

Advanced Option section

  1. Go to System > High Availability > Configuration.

  2. Expand the Advanced Option section.

  3. Configure the following and then click Apply:

    GUI item

    Description

    Synchronize mail data directory

    Enable if the HA cluster does not store its mail data on a NAS server, and you need to use HA communications to synchronize its system quarantine, per-recipient quarantines, email archives, email users’ preferences, and (server mode only) mailboxes.

    This setting applies only if HA mode is Active-Passive.

    Note

    You can manually initiate a data synchronization whenever significant changes occur. See Start configuration sync.

    Synchronize MTA queue directory

    Enable if you want to synchronize the mail queue with FortiMail units in the HA cluster.

    This setting applies only if HA mode is Active-Passive.

    Note

    If the primary unit experiences a hard drive failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note

    If you enable this option, it can reduce performance, and is not guaranteed to prevent data loss. Mail queue directories are very dynamic. Many email could be added to the queue between each sync.

    If you disable this option, data loss might not occur, either. After a failover, when the unit rejoins the cluster, a separate synchronization mechanism occurs. This often restores the mail queue. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queues.

    HA base port

    Enter the first of multiple port numbers (see Appendix C: Port Numbers) that will be used for:

    • heartbeat signals
    • synchronization control
    • data synchronization
    • configuration synchronization
    Note

    In addition to a lost heartbeat, other unresponsive network services and hardware failure can also be used to trigger failover. For details, see Service Monitor section and About HA heartbeat and synchronization.

    Note

    In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Start configuration sync.

    Heartbeat lost threshold

    Enter the amount of time, in seconds, that a primary unit can be unresponsive until HA detects a failure and performs the action in Action on failure.

    Note

    To determine the best heartbeat threshold, monitor your FortiMail unit's performance. Examine how long each high system resource usage lasts. Configure a threshold that is longer than most peak usage. This gives the secondary unit enough time to accurately confirm unresponsiveness, and avoid unnecessary failovers. (Heartbeat responses may be slow during peak load.) See also Using the dashboard, Centrally monitoring the HA cluster, and Troubleshoot resource issues.

    Note

    If you have service level agreements (SLA), then you may be required to keep this time short. If the failure detection time is too long, email delivery could be delayed or fail until HA detects the failure. This reduces service uptime.

    Remote services as heartbeat

    Enable to avoid the Action on failure action if the heartbeat links (see Interface section) temporarily fail, but service monitoring such as for SMTP (see Service Monitor section) detects that the primary unit is still available.

    Note

    The Action on failure action can still occur if the HA process restarts due to system reboot or HA daemon restart. Then it examines the physical heartbeat links first. If they are not found, then failure is detected.

    This setting provides an extra HA heartbeat only, not synchronization. To avoid synchronization problems, do not use remote service monitoring as a heartbeat for a long time. This feature is intended only as a temporary heartbeat until you reestablish a normal primary or secondary heartbeat link.

Interface section

This section configures the HA behavior of network interfaces on this FortiMail unit, especially whether they have a:

In a basic HA deployment, the heartbeat interface provides a basic signal to other HA group members about the health of the primary FortiMail unit. However, you can use an additional signals. Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in Action on failure. This can include reconfiguring network interfaces to move virtual IP addresses onto the new primary unit.Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in Action on failure. This can include reconfiguring network interfaces to move Virtual IP address (or Virtual IPv6 address) and Virtual hostname onto the new primary unit.

  1. Configure the interface monitoring interval and failure detection threshold. See Service Monitor section.
  2. Go to System > High Availability > Configuration.

  3. Expand the Interface section.

  4. Select a row for a network interface in the table, and then click Edit.

  5. Configure the following settings:

    GUI item

    Description

    Heartbeat status

    Enable if this interface will listen for HA heartbeat and synchronization communications.

    Note

    You must enable this option on at least one of the network interfaces that you defined for the unit in IPv4 address (or IPv6 address). Otherwise HA will detect a failure.

    Port

    Displays the name of the network interface that you are configuring.

    Optionally, you can click the name to view or configure its settings. See also Configuring the network interfaces.

    Virtual IP address (or Virtual IPv6 address)

    Enter a virtual IP address and netmask that the primary unit will have on this network interface. Upon failure detection, the secondary will become the new primary and start to use the virtual IP address.

    For gateway mode and server mode, DNS records should be configured to point to the virtual IP address, not the physical IP addresses.See also About HA modes, Configuring the network interfaces, and About IPv6 Support.

    This setting is available only if HA mode is Active-Passive.

    Virtual hostname

    Enter a virtual hostname.

    Similar to behavior with the virtual IP address, the virtual hostname belongs to the current primary unit. Upon failover, the secondary unit becomes the new primary unit, and so it starts to use the virtual hostname instead.

    This setting is available only if HA mode is Active-Passive.

    Enable port monitor

    Enable to monitor the network interface for failure. Connection interval and retries occur according to the interface monitoring settings in Service Monitor section.

Service Monitor section

Failed FortiMail units, in the simplest HA deployments, are detected by an interrupted heartbeat. However HA can also detect failure of hardware and network services. Heartbeats detect the general responsiveness of a primary unit, but do not test each daemon (for example, POP3 or webmail service), hard drive, and physical network ports used by non-heartbeat traffic. Therefore you can add hardware and service monitoring to be more specific. Alternatively, if the heartbeat link is briefly disconnected, services monitoring can prevent an unnecessary failover by temporarily acting as a secondary heartbeat.

With service monitoring, the secondary unit connects to the SMTP, POP3, and/or web service (HTTP) on the primary unit to detect failure. For server mode, IMAP service can also be monitored.

With local network interface monitoring and hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.Hard drive monitoring tests that the local hard drive is still accessible, and disk space exists for mail data. If the hard disk is not responsive, or if the mail data disk is 95% full, then a failure is detected.

Network interface monitoring tests all network interfaces where:

Alert email, log messages, and SNMP traps (if configured) indicate the specific cause.

To configure hardware and service monitoring

  1. Go to System > High Availability > Configuration.

  2. Expand the Service Monitor section.

  3. Select a row in the table and click Edit.

    For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring for the service.

    Name

    Displays the service name.

    Port

    Enter the listening port number of the service on the primary unit and (active-active HA only) secondary. See also Appendix C: Port Numbers and Mail access.

    Timeout

    Enter the amount of time in seconds to wait for a response when service monitoring tries to connect.

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

    For interface monitoring, configure the following and click OK (to specify which ports are monitored, see Interface section):

    GUI item

    Description

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

    For local hard drive monitoring, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring of the local hard drive.

    Interval

    Enter the amount of time in seconds between each try.

    Retries

    Enter the number of consecutive unsuccessful tries that indicates a failure.

See also

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Monitoring HA status

After you configure HA (see Configuring HA), to view the current roles and synchronization status of the HA group, go System > High Availability > Status. You can also manually initiate some HA actions, such as Sync and Failover.

Most information is automatically populated after the primary unit connects to this unit, and that unit joins the HA cluster. Then HA statuses such as Status are kept up-to-date via the heartbeat.

GUI item

Description

Type

Displays the configured Type.

Mode

Displays the configured HA mode.

Refresh

(button)

Click to get the newest data and display it on System > High Availability > Status.

If the display does not refresh, you may need to click Clear Cache first.

Failover

(button)

Select a FortiMail unit or cluster, and then click this button to manually trigger a failover.

Restore

(button)

Select a FortiMail unit or cluster, and then click to manually restart HA and reset Effective to match the unit's initially configured Member role.

Caution: When a failed unit reboots, don't click Restore until it finishes synchronizing its mail queue and other data with the current primary. If this recovery mechanism is interrupted, data could be lost. For details, see Status and Synchronize mail data directory.

Sync

(button)

Select a FortiMail unit or cluster, and then click to manually initiate configuration synchronization with other FortiMail units in the HA cluster or group of clusters. See also Settings that are not synchronized by HA.

Clear Cache

(button)

Click to reload the heartbeat daemon and its status data to show current information.

Name

Name of the unit and, if there are multiple HA clusters, the Group name.

SN

Serial number.

IP

IPv4 address (or IPv6 address).

Version

Firmware version. A FortiMail unit must run the same firmware version in order to join the HA group, so that the configuration can be synchronized. Exceptions are during updates. See Upgrading firmware on HA units.

Configured

See Combinations of configured and effective HA role.

In active-active HA, the secondary unit that is the Primary backup (if configured) will display Secondary, like other secondary units.

Effective

See Combinations of configured and effective HA role.

After a failure has been detected, this status may not match the initially configured Member role. To return to that role, click Restore.

Status

Displays the status of HA cluster joining, heartbeat, and synchronization. See also Combinations of configured and effective HA role.

  • Running — Normal HA operation. Recent synchronization was successful.

  • Starting — HA processes are starting. This briefly occurs when you enable HA, before the unit joins a cluster.

  • Restarting — The unit is rebooting. Other units in the HA cluster will wait a little longer than the usual HA heartbeat interval in order to allow the reboot to complete without triggering a failover.

  • Paused — HA synchronization has been either manually (see the FortiMail CLI Reference) or automatically paused (while the other unit is Restarting). Normal on other units while you upgrade the primary unit.

  • Stopping — HA processes are shutting down.

  • Unseen — Heartbeats have not yet been detected for this unit, and therefore the unit is not yet joined to the HA cluster. Verify that the unit is powered on, HA is enabled, and the heartbeat interfaces are reachable by other units in the cluster.

  • Vanished — Heartbeat was detected for this unit but then disappeared. Verify that the unit is powered on, HA is enabled, and the heartbeat interfaces are reachable by other units in the cluster.

  • Build Mismatch — Other units in the HA cluster do not have the same firmware version. Configuration synchronization requires that all units in the cluster have the same firmware, since different versions may support different features. Normal only while you upgrade firmware in the cluster.

  • Bad Config — HA processes could not start because the configuration was not valid. Verify the HA settings such as the shared password and heartbeat interfaces.

  • Checking — Running a manually requested configuration checksum verification. If there are errors, it starts a configuration sync. Normal while you run the CLI command diag sys ha sync-status.
  • Config Failed — Unrepairable configuration mismatch was detected. Verify the settings that are not in HA.
  • Failed — HA processes have an internal error. If this occurs again, contact Fortinet Technical Support.
  • Asking for Snapshot — The unit is requesting a configuration checksum from the other units in the cluster or group of clusters. If the checksum is different, then the configurations are out-of-sync. Units in the HA cluster must download the new configuration file.

  • Getting Snapshot — The unit is downloading a configuration snapshot.

  • Resyncing — The unit is trying to synchronize its configuration with other units in the cluster. Usually this occurs after downloading a configuration snapshot, or if it detects a checksum error.

  • Synchronized — Configuration is synchronized.

Up Time

Amount of time that the HA cluster member has been operational.

Last Seen

When this FortiMail unit’s HA daemon last communicated with the others in the HA group to make sure that they are available. See also Heartbeat lost threshold and HA base port.

See also

Centrally monitoring the HA cluster

About HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Configuring HA

Service Monitor section

Combinations of configured and effective HA role

To adapt when it detects a heartbeat or synchronization failure, a FortiMail HA unit may no longer be operating in its initially configured Member role.

Combinations of the Configured and Effective columns on System > High Availability > Status indicate if the unit joined the HA cluster and it is operating normally or not. The Status column may indicate troubleshooting information.

Configured

Effective

Result

Primary

Primary

Normal for the primary.

Secondary

Secondary

Normal for the secondary.

In active-active HA, however, this can also happen if the primary has failed. (Most of the secondaries continue to show Secondary. Only the unit where you enabled Primary backup has Effective showing Primary.)

Primary
or Secondary

Discovering

Initial HA configuration is complete. The primary is now trying to connect with other HA units to form a heartbeat link.

Primary
or Secondary

Registering

Heartbeat connection succeeded and the unit is joining the cluster.

Primary
or Secondary

Unknown

Initial HA configuration was not able to complete. Therefore the unit could not try to join an HA cluster or group. For example, if the primary is defined, but not the other units, then HA cannot form a heartbeat link yet. This situation should correct itself once all units are configured.

Primary
or Secondary

Off

Either the:

  • heartbeat has failed, and Action on failure is Switch off immediately
  • HA process is starting

and the heartbeat and configuration synchronization are currently stopped.

After the secondary joins an HA cluster or group, some causes such as network interruptions could cause the first configuration synchronization to fail. To prevent both the secondary and primary from simultaneously acting as primary ("split brain"), Effective temporarily becomes Off. If the next synchronization fails again, then the secondary's Effective becomes Primary.

To restart HA processes and return the unit to the originally configured role, click Restore.

Primary
or Secondary

Hold Off

The primary is rebooting or upgrading firmware. It asked to wait longer than the usual Heartbeat lost threshold so that the reboot can complete. If the primary does not return, then the secondary performs the action in Action on failure or Primary backup.

Primary

Failed

Remote service monitoring, or local hard drive, or network interface monitoring has detected a failure. If operating in transparent mode, then on System > Network > Interface, the network interface IP/Netmask on the secondary displays Bridging (waiting for recovery).

When you correct the failure, Effective changes to either Secondary or Primary, depending on Action on failure.

Primary

Secondary

The primary failed. A secondary automatically became the new primary. When the failed unit restarted, it detected that there was already a primary in the HA cluster or group, and so now the failed unit is the new secondary.

If you want the failed unit to return to acting as the primary, click Restore.

Secondary

Primary

The secondary detected that the primary failed, and then the secondary became the new primary.

If you want it to return to acting as the secondary,click Restore.

Secondary

Secondary (No Primary)

The secondary detected that the primary failed, but it was not configured as Primary backup. Therefore configuration synchronization cannot occur until you either repair the primary, or manually configure a secondary to become the new primary.

This occurs only if HA mode is Active-active.

See also

About HA heartbeat and synchronization

Monitoring HA status

Configuring HA

Service Monitor section