Fortinet black logo

Administration Guide

Using high availability (HA)

Using high availability (HA)

Go to System > High Availability to configure the FortiMail unit to act as a member of a high availability (HA) cluster in order to increase processing capacity and/or availability, so that your deployment is still up even if some hardware fails.

This section contains the following topics:

About HA modes

FortiMail HA can operate in either:

  • active-passive mode
  • active-active mode

Active-passive HA

Active-active HA

2 FortiMail units in the HA group

2-24 FortiMail units in the HA group

Typically deployed behind a switch

Typically deployed behind a load balancer

Both configuration* and data synchronized^

Only configuration* synchronized

Only primary unit processes email

All units process email

No data loss^ when hardware fails

Data loss when hardware fails

No increased processing capacity

Increased processing capacity

* For exceptions, see Settings that are not synchronized by HA.

^ For exceptions, see Synchronization of MTA queue directories after a failover.

Active-passive HA group operating in gateway mode

only primary unit processes email in active-passive HA

Active-active HA group operating in gateway mode

all available units process email in active-active HA

When a FortiMail unit fails, current SMTP sessions are interrupted. SMTP clients usually handle this gracefully, and restart a new connection. Traffic is redirected away from the failed FortiMail by different methods that vary by HA mode

  • Active-passive: The secondary unit starts to use the virtual IP address of the failed unit, and uses ARP to automatically notify the switch or router that traffic should now be redirected to its network interface instead.
  • Active-active: The load balancer stops sending email connections to failed FortiMail units. Only live FortiMail units continue to receive connections.
Note

You can mix different FortiMail models in the same HA group. However:

  • All FortiMail units in the HA group must have the same firmware version.
  • Capacity and maximum configuration values are limited by the least powerful model.

To configure FortiMail units in an HA group, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units, so that changes made to the primary unit are propagated to the secondary units.

Exceptions include:

Note

To use FortiGuard Antivirus or FortiGuard Antispam with HA, you must license all FortiMail units in the cluster. Only licensed devices can use the subscription services.

See also

Configuring an HA group

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Storing mail data from HA groups on a NAS server

Example: Failover scenarios

Example: Active-passive HA group in gateway mode

About the HA heartbeat and synchronization

Heartbeat and synchronization through the primary and secondary heartbeat network interfaces:

Note

Synchronization intervals vary.

  • FortiGuard Antispam and FortiGuard Antivirus packages: Not synchronized.
  • Mail queue: Up to 20 minutes (not real time).
  • Configuration: Real time.

If configuration synchronization did not occur when expected, or if you have inadvertently de-synchronized the secondary unit’s configuration (for example, if a cable was accidentally disconnected), then you can manually initiate synchronization via GUI or the CLI command diagnose system ha sync on either the primary unit or the secondary unit.

Periodically, the secondary unit verifies that all configuration changes have been synchronized. If they have not, then the secondary unit will pull the configuration changes from the primary unit and reload the new configuration.

Secondary units also can push any changes made to its block and safe lists back to the primary unit. In active-active HA, these changes are then synchronized to all other secondary units.

The secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal detects failure of the primary unit, and triggers the action that you select in On failure. For details, see Example: Failover scenarios.

Exceptions include system restarts and the execute reload CLI command. If the primary unit reboots or reloads its configuration, then it signals to the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.

Behavior when the heartbeat signal is lost varies by HA mode and On failure:

  • Active-passive: The secondary unit becomes the new primary unit and starts receiving email connections. Some in-progress email connections may be interrupted and must be restarted, but most email clients and servers can gracefully handle this.

  • Active-active: If Primary backup has been selected, then your preferred backup unit will take over the role of the primary unit (Effective role becomes Primary).

    If a specific Primary backup is not selected, then each secondary unit continues to operate as a secondary unit. However, with no primary unit, changes to the configuration are not synchronized anymore.

For failover examples and steps required to restore the initially configured roles in each case, see Example: Failover scenarios.

Interface monitoring, hard drive monitoring, and remote service monitoring do not provide configuration and data synchronization, and therefore they are not a complete replacement for the heartbeat. However you can use them as another way to detect failure. See Interface section and Service Monitor section.

See also

About HA modes

About HA port numbers and protocols

About logging, alert email, and SNMP for HA

Settings that are not synchronized by HA

Storing mail data from HA groups on a NAS server

Synchronization of MTA queue directories after a failover

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

About HA port numbers and protocols

The default protocol and port numbers for HA heartbeat, synchronization, and service monitoring communications are configurable. See HA base port, the control-packet-option setting in the FortiMail CLI Reference, and Appendix C: Port Numbers.

Note

If a firewall is between the primary and secondary FortiMail unit, then verify that the firewall policy allows HA port numbers. Blocked HA ports can cause incorrect failover and synchronization failure.

Settings that are not synchronized by HA

All settings on the primary unit are synchronized to the secondary unit, except the following:

Settings

Explanation

Operation mode

You must set the operation mode (gateway, transparent, or server) of each HA group member before configuring HA. Many settings vary by operation mode, and therefore configurations cannot be synchronized if the operation mode is different.

Host name

Different host names are used to distinguish members of the HA cluster when connecting to the GUI and to indicate which unit failed. For details, see Hostname.

Static route

Static routes are not synchronized because some or all in the network interfaces on each FortiMail unit in the HA cluster may be connected to different subnets. See also Configuring static routes .

Interface configuration

(gateway and server mode only)

Administrator connections to the GUI/CLI, alert email, and many other features require that you configure at least one network interface with an IP address. For details, see Configuring the network interfaces.

Exceptions include virtual IP addresses on active-passive HA. Virtual IP addresses are synchronized because, upon failover, the secondary unit must starts to use them. This mechanism allows traffic to receive connections instead of the failed primary unit. See Virtual IP address (or Virtual IPv6 address).

Management IP address

(transparent mode only)

Each FortiMail unit in the HA cluster should be configured with different management IP addresses for GUI and CLI connectivity purposes. For details, see About the management IP.

SNMP system information

Each FortiMail unit in the HA cluster will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring SNMP queries and traps.

RAID configuration

RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

Some HA settings

Product name and icon

The product name and icon under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

Miscellaneous settings
(active-active HA only)

In active-active HA, the following settings are not synchronized:

All system, domain, and user level block/safe lists are synchronized.

Note

User data is synchronized at predefined time intervals, not in real time.

See also

About the HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

During normal operation in active-passive HA, email messages are either:

  • being received or sent by the primary FortiMail unit
  • waiting to be delivered in the mail queue
  • stored in the primary unit’s mail data directories (email quarantines, email archives, and email inboxes of server mode)

When a failure occurs, sending and receiving is interrupted. The delivery attempt fails, and the sender usually retries to send the email message. However, stored messages remain in the primary unit’s mail data directories.

To prevent data loss when a primary unit fails, you usually should enable Synchronize mail data directory (unless NAS storage is used), but do not need to enable Synchronize MTA queue directory. This is because of an automatic recovery mechanism in FortiMail HA failover.

  1. The secondary or primary backup unit detects that the primary unit has failed, and becomes the new primary unit.

  2. If the former primary unit can reboot, it detects the new primary unit, and becomes a secondary unit.

    Note

    Depending on the On failure setting, you may be required to click Restart HA on a failed primary unit.

  3. The former primary unit pushes its mail queue to the new primary unit.

    This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

  4. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

As a result, if the failed primary unit can restart, no email is lost from the mail queue.

Even if you choose to synchronize the mail queue, because its contents change very rapidly and synchronization is periodic, there is a chance that some email will not have not been synchronized when a failover occurs.

See also

About the HA heartbeat and synchronization

Storing mail data from HA groups on a NAS server

Storing mail data from HA groups on a NAS server

If you have FortiMail units operating in server mode and in an active-active HA group, you must store mail data centrally on a network attached storage (NAS) server — not on each FortiMail unit. Otherwise email users’ messages and other mail data could be scattered across multiple FortiMail units.

For other HA and operating modes, however, it still may be better to store mail data on a NAS server.

For example, regular NAS server backups help to prevent mail data loss, even if a FortiMail unit has hardware failure. Also, during a temporary failure of a FortiMail unit, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

For active-active HA with a NAS server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email and to configure their own Bayesian filters.

For active-passive HA groups, the primary unit reads and writes all mail data to and from the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit uses the same NAS server for mail data. The new primary unit can access all mail data that the original primary unit stored on the NAS server. So if you are using a NAS server to store mail data, after a failover, the new primary unit continues operating with no loss of mail data.

Note

If the FortiMail unit is a member of an active-passive HA group, and the HA group stores mail data on a remote NAS server, disable mail data synchronization to prevent duplicate mail data traffic.

For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

See also

About the HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

About logging, alert email, and SNMP for HA

For faster discovery and diagnosis of network problems that have caused an HA failover, you can configure SNMP, Syslog, and/or alert email to monitor the HA cluster.

To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to the secondary units, all FortiMail units in the HA group record their own separate log messages and send separate alert email messages. Log data is not synchronized.

Note

To distinguish alert email from each member of the HA cluster, configure a different host name for each member. For details, see Hostname.

To use SNMP to monitor HA failover, configure each cluster member to enable HA events for the SNMP community, such as:

See also

Configuring SNMP queries and traps

Logs, reports, and alerts

About the HA heartbeat and synchronization

Configuring an HA group

To deploy FortiMail units as a high availability (HA) cluster, perform the following steps in order.

To deploy an HA group

  1. Register all FortiMail units in the HA cluster with the Fortinet Technical Support web site:

    https://support.fortinet.com/

    If you use licensed features such as centralized HA monitoring, FortiGuard Antivirus, and/or FortiGuard Antispam, also purchase and register licenses for all units.

  2. Connect the network interfaces that will be used for the heartbeat and synchronization between FortiMail units in the HA cluster. At least one heartbeat link is required.

    For example, you could use a network cable to connect FortiMail A's port2 to FortiMail B's port2.

    Note

    Don't disconnect the heartbeat once HA is enabled. If the heartbeat is accidentally interrupted for an active-passive HA group, such as when a network cable is temporarily disconnected, the secondary unit will assume that the primary unit has failed, and become the new primary unit. If no failure has actually occurred, both FortiMail units will be operating as primary units at the same time. This can cause an IP address conflict. In active-active HA groups, configuration synchronization can be disrupted. For details on correcting this, see Restore to configured role.

    Caution

    For better heartbeat reliability, create two heartbeat links: a primary and a secondary. Directly link the pair of heartbeat ports with an Ethernet crossover cable, or connect them through a dedicated local switch that is not connected to your overall network. This ensures enough bandwidth and low latency for the synchronization and heartbeat. If the heartbeat is interrupted, then a failover may occur. See also About the HA heartbeat and synchronization.

  3. If you are making an active-passive HA group, and the operation mode is gateway or server, add a Virtual IP address (or Virtual IPv6 address) and Virtual hostname to the network interface that will receive email connections. Update DNS records to use this virtual IP address, not the physical IP address. Wait for the DNS records to propagate to non-authoritative DNS servers before you enable HA.
  4. If you are making an active-active HA group, configure storage of mail data on a NAS server. See Storing mail data from HA groups on a NAS server.(Active-passive members can also benefit from a NAS server, but do not require it.)

    Caution

    For active-active HA, if the FortiMail unit is operating in server mode, you must store mail data externally on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units.

  5. On each member of the HA group, go to System > High Availability > Configuration and:

    1. Configure the following:

      GUI item

      Description

      State

      Enable or disable HA.

      HA mode

      Select either Active-Active or Active-Passive. For details, see About HA modes.

      On failure

      Select what the HA group will do when it detects a failure, either:

      • Switch off immediately: On recovery, do not process email or join the HA group until you manually select the Effective role (see Restart HA and Restore to configured role).
      • Wait for recovery: On recovery, the failed primary unit’s Effective role becomes Secondary. To manually restore the FortiMail unit to acting in its configured Role, see Restore to configured role.
      • Wait for recovery and switch to configured role: On recovery, the failed primary unit's Effective role automatically becomes Primary again, and the secondary unit that was temporarily acting as primary automatically becomes Secondary again. This option may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is recurring, resulting in many extra role changes.

      Tip: In most cases, you should select Wait for recovery.

      Shared password

      Enter an HA password for the HA group members.

      Before HA group members synchronize with each other, they verify that they have the same shared password. This prevents them from accidentally synchronizing with FortiMail units that do not belong to the same cluster. Therefore you must add the shared HA password to each unit in the HA group.

    2. Expand the Member section. For each FortiMail unit in the HA group, click New and configure the following:

      GUI item

      Description

      Role

      Select the role of the FortiMail unit in the HA group, either Primary or Secondary

      Each HA group member's role is not synchronized because this distinguishes the primary and secondary units.

      Effects of the role vary by HA mode. See About HA modes.

      IPv4 address
      (or IPv6 address)

      Enter the IP address of the network interface that will listen for the heartbeat and synchronization on the primary or secondary (depending on which entry you are currently configuring in the table).

      If you want more heartbeat interfaces, click + and then add those IP addresses.

      Alternatively, if you are currently configuring the device that you are adding to the table, click Use Current Device.

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the IP address is configured here, then HA will detect that the heartbeat link has failed.

      Hostname

      Displays the hostname of the primary or secondary (depending on which entry you are currently configuring in the table).

      Note: Do not configure the hostname here. It will not update the hostname used by the FortiMail unit's SMTP relay/proxy. Instead, configure Host name in the mail settings and Virtual hostname, and then click Use Current Device to automatically paste the hostname into this field.

      Primary backup

      (Active-active secondary units only)

      If HA mode is Active-Active, then there can be many secondary units. Enable this setting if Role is Secondary, and you want to select this member to become the new primary when a failure is detected.

      Note: Usually you should have a primary backup. Otherwise configuration synchronization will be interrupted upon failure. See About the HA heartbeat and synchronization.

      Comment

      Optional. Enter a descriptive comment.

    3. If the HA group is active-passive, configure the Virtual IP address (or Virtual IPv6 address) that will transfer upon failover.
    4. If the HA group stores mail data on NAS, disable Synchronize mail data directory.

    5. Optionally, configure:

    6. Click Apply on the primary unit, and then on the secondary units.
  6. If the HA group is active-active, configure the load balancer with either remote service monitoring or interface monitoring to detect failed FortiMail units, and to redirect connections to available FortiMail units.
  7. Monitor the status of each cluster member. For details, see Monitoring HA status, Logs, reports, and alerts, and Centrally monitoring the HA cluster.

See also

About HA modes

About the HA heartbeat and synchronization

Settings that are not synchronized by HA

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Advanced Option section

  1. Go to System > High Availability > Configuration.

  2. Expand the Advanced Option section.

  3. Configure the following and then click Apply:

    GUI item

    Description

    Synchronize mail data directory

    (Active-Passive only)

    Enable if the HA group does not store its mail data on a NAS server, in order to synchronize system quarantine, per-recipient quarantines, email archives, email users’ preferences, and (server mode only) mailboxes with the HA group members.See Storing mail data from HA groups on a NAS server.

    If mail data changes frequently, you can manually initiate a data synchronization when significant changes are complete. For details, see Start configuration sync.

    Synchronize MTA queue directory

    (Active-Passive only)

    Enable if you want to synchronize the mail queue with the HA group members.

    Caution: If the primary unit experiences a hardware failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note: Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queue.

    Note

    Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queue.

    HA base port

    Enter the first of multiple port numbers (see Appendix C: Port Numbers) that will be used for:

    • heartbeat signals
    • synchronization control
    • data synchronization
    • configuration synchronization
    Note

    For both active-active and active-passive HA, in addition or alternatively to configuring the heartbeat, you can configure service monitoring. For details, see Service Monitor section and About the HA heartbeat and synchronization.

    Note

    In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Start configuration sync.

    Heartbeat lost threshold

    Enter the total amount of time, in seconds, that a FortiMail unit can be unresponsive until and HA detects a failure and performs the action in On failure.

    Tip: The heartbeat verifies availability every1 second. To prevent unnecessary failover when the primary unit is temporarily experiencing very heavy load and therefore heartbeat responses are slow, configure a longer threshold (for example, 3 seconds or more) to allow the secondary unit enough time to send more heartbeat signals to confirm unresponsiveness. To determine the best heartbeat threshold, it is useful to know your FortiMail unit's performance baseline and peaks. See also Establish a system baseline and Troubleshoot resource issues.

    Note

    If you have service level agreements (SLA), then you may be required to keep this time short. If the failure detection time is too long, email delivery could be delayed or fail until HA detects the failure. This reduces service uptime.

    Remote services as heartbeat

    Enable to avoid the the On failure action if both the primary and secondary heartbeat links temporarily fail, but remote service monitoring detects that the FortiMail unit is still available.

    Note

    The On failure action can still occur if the HA process restarts due to system reboot or HA daemon restart. Then it examines the physical heartbeat links first. If they are not found, then failure is detected.

    This setting provides an extra HA heartbeat only, not synchronization. To avoid synchronization problems, do not use remote service monitoring as a heartbeat for a long time. This feature is intended only as a temporary heartbeat until you reestablish a normal primary or secondary heartbeat link.

Interface section

In a basic HA deployment, the heartbeat interface provides a basic signal to other HA group members about the health of the primary FortiMail unit. However, you can use an additional signals. Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in On failure.

  1. Optionally, configure the interface monitoring interval and failure detection threshold. See Service Monitor section.
  2. Go to System > High Availability > Configuration.

  3. Expand the Interface section.

  4. Select a row for a network interface in the table, and then click Edit.

  5. Configure the following settings:

    GUI item

    Description

    Heartbeat status

    Enable if this interface will listen for HA heartbeat and synchronization communications.

    Note

    You must enable at least one of the heartbeat interfaces that you defined in IPv4 address (or IPv6 address). Otherwise HA will detect a failure.

    Port

    Displays the name of the network interface that you are configuring.

    Optionally, you can click the name to view or configure its settings. See also Configuring the network interfaces.

    Virtual IP address (or Virtual IPv6 address)

    Enter a virtual IP address that the primary unit will have on this network interface. Upon failure detection, the secondary will become the new primary and start to use the virtual IP address.

    For gateway mode and server mode deployments, DNS records should be configured to point to the virtual IP address, not physical IP addresses.See also About HA modes, Configuring the network interfaces, About IPv6 Support.

    This setting is available only if HA mode is Active-Passive.

    Note

    The interface IP address must be different from, but on the same subnet as, the IP addresses of the other heartbeat network interfaces of other members in the HA group.

    When configuring other FortiMail units in the HA group, use this value as the:

    • Remote peer IP (for active-passive HA)
    • Primary configuration (for secondary units in active-active HA)
    • Peer systems (for the primary unit in active-active HA)

    Virtual hostname

    Enter a virtual hostname.

    Similar to behavior with the virtual IP address, the virtual hostname belongs to the current primary unit. Upon failover, the secondary unit becomes the new primary unit, and so it starts to use the virtual hostname instead.

    This setting is available only if HA mode is Active-Passive.

    Enable port monitor

    Enable to monitor a physical network port for failure. If the port fails, a failure is detected by the HA cluster.

Service Monitor section

Failed FortiMail units, in the simplest HA deployments, are detected by an interrupted heartbeat. However HA can also detect failure of hardware and network services. Heartbeats detect the general responsiveness of a primary unit, but do not test each daemon (for example, POP3 or webmail service), hard drive, and physical network ports used by non-heartbeat traffic. Therefore you can add hardware and service monitoring to be more specific. Alternatively, if the heartbeat link is briefly disconnected, remote services monitoring can prevent an unnecessary failover by temporarily acting as a secondary heartbeat.

With remote service monitoring, the secondary unit connects to the SMTP, POP3, and/or web service (HTTP) on the primary unit to detect failure. For server mode, IMAP service can also be monitored.

With local network interface monitoring and hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.Hard drive monitoring tests that the local hard drive is still accessible, and disk space exists for mail data. If the hard disk is not responsive, or if the mail data disk is 95% full, then a failure is detected.

Network interface monitoring tests all network interfaces where:

Alert email, log messages, and SNMP traps (if configured) indicate the specific cause.

For example, if service monitoring detects failure of port2 on the primary unit, it records this log message:

date=2005-11-18 time=18:20:31 device_id=FE-4002905500194 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: local problem detected (port2), shutting down"

and sends this alert email:

Subject: monitord: local problem detected (port2), shutting down [primary-host-name]

This is the FortiMail HA unit at 10.0.0.1.

A local problem (port2) has been detected, telling remote to take over and shutting down.

To configure hardware and service monitoring

  1. Go to System > High Availability > Configuration.

  2. Expand the Service Monitor section.

  3. Select a row in the table and click Edit.

    For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring for the selected service.

    Name

    Displays the service name.

    Port

    Enter the listening port number of the service on the primary FortiMail and (active-active HA only) secondary. See also Appendix C: Port Numbers.

    Timeout

    Enter the amount of time in seconds to wait for a response to the connection.

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

    For interface monitoring, configure the following and click OK (to configure which ports are monitored, see Interface section):

    GUI item

    Description

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

    For local hard drive monitoring, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring that the local hard drive.

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

See also

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Monitoring HA status

After you configure HA (see Configuring an HA group), to view the roles and synchronization status of the HA group, go System > High Availability > Status. You can also manually initiate synchronization and reset the current Effective role to match the initial Configured role.

GUI item

Description

State

Displays the configured HA mode.

Configured role

Displays the configured Role.

In active-active HA, the secondary unit that is the primary backup (if configured) will display Secondary, like other secondary units.

After a failure has been detected, the FortiMail unit may not be acting in the role that it was initially configured for, and then this will not match Effective role. For details, see Combinations of configured and effective HA role.

Effective role

Displays the role that this FortiMail unit is currently operating in, either:

  • Primary: Acting as primary unit.
  • Secondary: Acting as secondary unit.
  • Off: For primary units, this indicates that interface or remote service monitoring has detected a failure and therefore the primary unit went offline and halted HA processes. For secondary units, this indicates that it detected an HA synchronization failure; if sync immediately fails again, then the action in On failure will occur. See also Restart HA.
  • Failed: Service monitoring or network interface monitoring has detected a failure and the diagnostic connection is currently determining if the problem has been corrected or it must perform the action in On failure.
  • Holdoff: For secondary units, this indicates that the primary unit is rebooting and asked to wait longer than the usual Heartbeat lost threshold so that the reboot can complete. If the primary does not return, then a failure is detected and it must perform the action in On failure.

After a failure has been detected, the FortiMail unit may not be acting in the role that it was initially configured for, and then this will not match Configured role. For details, see Combinations of configured and effective HA role.

For information on restoring the FortiMail unit to the initially configured role, in Action, click Restore to configured role.

Member Status

A table with some basic statuses about all FortiMail units that belong to the HA group, including:

  • SN: Serial number.

  • IP: IPv4 address (or IPv6 address) of the network interface for the primary heartbeat.

  • Version: Firmware version. A FortiMail unit must run the same firmware version in order to join the HA group, so that the configuration can be synchronized.

  • Configured: Configured role.

    In addition, if a secondary unit has been configured as the Primary Backup, it is denoted with an icon.

  • Effective: Effective role.

  • Status: Whether or not the HA cluster is synchronized.

  • Up Time: Duration of time that the HA cluster member has been operational.

  • Last Seen: When this FortiMail unit’s HA daemon last communicated with the others in the HA group to make sure that they are available. See also Heartbeat lost thresholdand HA base port.

Action

Depending on the context, one or more the following actions may be available:

  • Start configuration sync: Click to manually initiate configuration synchronization with other FortiMail units in the HA cluster. See also Settings that are not synchronized by HA.

  • Restore to configured role: Click to manually reset the Effective role to match the unit's Configured role.

  • Restart HA: If the primary unit's Effective role is Off, and then you have fixed the cause of the failure, click to restart HA processes.

See also

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Configuring an HA group

Service Monitor section

Example: Failover scenarios

Combinations of configured and effective HA role

Role

Effective role

Result

Primary

Primary

Normal for the primary unit of an HA group.

Secondary

Secondary

Normal for the secondary unit of an HA group.

In active-active HA, this can also occur if the primary unit has failed. Most of the secondary units continue to be secondary. If you selected one of them to be the primary backup, however, then its Effective role becomes Primary.

Primary

Off

Either the:

  • primary unit failed, and On failure is Switch off immediately
  • FortiMail unit is starting to operate in HA mode

and its HA processes such as configuration synchronization are stopped. To return it to the originally configured role, see Recovering from a heartbeat link failure.

Note: This is caused by a stopped heartbeat, not remote service monitoring or hardware/interface monitoring.

Secondary

Off

The secondary unit has detected a failure, or the FortiMail unit is starting to operate in HA mode.

After the secondary unit starts and connects with the primary unit to form an HA group, the first configuration synchronization may fail. To prevent both the secondary and primary units from simultaneously acting as primary units, the Effective role becomes Off. If the next synchronization fails, then the secondary unit’s Effective role becomes Primary.

Primary

Failed

Remote service monitoring or local network interface monitoring on the primary unit has detected a failure.

Once the problem that caused the failure has been corrected, the Effective role changes from Failed to either Secondary or Primary, depending on the On failure setting.

Primary

Secondary

The primary unit failed. A secondary unit automatically became the new primary unit. When the failed unit restarted, it detected that there was already a primary unit in the HA group, and so now the failed unit is the new secondary unit.

If you want the failed unit to return to acting as the primary unit, in Action, you must manually select Restore to configured role.

Secondary

Primary

The secondary unit detected that the primary unit failed, and then the secondary unit became the new primary unit.

If you want it to return to acting as the secondary unit, in Action, you must manually select Restore to configured role.

See also

About the HA heartbeat and synchronization

Monitoring HA status

Configuring an HA group

Service Monitor section

Recovering from a heartbeat link failure

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Example: Active-passive HA group in gateway mode

In this example, two FortiMail units in gateway mode are configured as an active-passive HA group.

This example describes HA configuration for this scenario. Before beginning, verify that both of the FortiMail units are:

Virtual IP address for active-passive HA failover

VIP transfers to secondary upon failure of primary active-passive HA FortiMail unit

For both FortiMail units:

port1

  • connected to a switch which is connected only to administrator computers
  • administrative access is enabled only on this port

port3

  • connected to a switch which is connected to the remaining private network and, indirectly through a FortiGate, the Internet
  • email connections occur only through this port

port5

  • connected directly to the other FortiMail unit
  • heartbeat and synchronization occurs through this port

port6

  • connected directly to the other FortiMail unit
  • heartbeat and synchronization occurs through this port

When a failover occurs, the secondary unit starts to act as the new primary. Then it must receive email connections. To make this happen, you will configure Virtual IP address (or Virtual IPv6 address). Email connections are to the VIP (not the regular port3 IP address). Initially, the VIP is on the original primary unit's port3. After failover, the secondary unit becomes the new primary and starts to use the VIP on its port3 instead.

This example contains the following topics:

About standalone versus HA deployment

If you want to convert a standalone FortiMail unit to a member of an HA group, it may help to understand how HA and standalone deployments are similar and different.

For example, compare the diagram Virtual IP address for active-passive HA failover with a standalone deployment.

Example network interfaces on a standalone FortiMail

Network interface

IP address

Description

port1

192.168.1.5

Administrative connections to the FortiMail unit

port2, port4

Default

Not connected.

port3

172.16.1.2

  • Email connections to the FortiMail unit
  • Internal DNS PTR, A and AAAA records resolve to this IP address

port5

Default

Not connected.

port6

Default

Not connected.

On both, administrators connect to the IP address of port1. DNS records and email connections use the IP address of port3.

However on HA, port3 on the primary unit has an additional IP address: the virtual IP address (VIP). Instead of the regular IP address, private network DNS records and email connections point to the VIP. When the primary fails, the secondary unit becomes the new primary, and starts to use the port3 VIP. This causes the network to automatically redirect connections there.

On HA, additionally, port6 is connected. This link is used only by HA heartbeat and synchronization between the primary and secondary unit.

Configuring the DNS records and firewall

In the diagram Virtual IP address for active-passive HA failover, SMTP clients on the private network connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, however, they connect through the public network, using an IPv4 virtual IP (VIP) on the FortiGate unit. FortiGate policies allow, NAT, and route connections to another VIP on the primary FortiMail unit.

Because of NAT, the public DNS server on the Internet must not use private network IP addresses:

  • A and/or AAAA records resolve fortimail.example.com into thepublic VIP on the FortiGate unit — not the private network VIP on the FortiMail primary unit
  • PTR records to enable external email servers to use a reverse DNS query to resolve the public VIP on the FortiGate unit into fortimail.example.com
  • MX records to indicate that fortimail.example.com is the email gateway for example.com, like usual

Configuring the primary unit for HA operation

In the standalone gateway mode configuration shown in About standalone versus HA deployment, the FortiMail unit’s port3 IP address is 172.16.1.2. The FortiGate unit is configured to NAT email connections to and from that private network IP address.

To achieve the same result with an active-passive HA group, you will add a virtual IP address of 172.16.1.2 to port3 on the primary unit. Email connections occur through this virtual IP address, instead of the physical IP address. You will also add a heartbeat link between the HA members on port6.

To configure the primary unit for HA

  1. Before you start, verify that the IP address and DNS records match what is shown in Example: Active-passive HA group in gateway mode.

  2. On the primary unit, go to System > Network > Interface.

  3. Configure port6 to 10.0.0.2/255.255.255.0 and port5 to 10.0.1.2/255.255.255.0.

  4. Go to System > High Availability > Configuration.

  5. Configure the following:

    GUI item

    Value

    HA mode

    Active-Passive

    On failure

    Wait for recovery then switch to configured role

    Shared password

    YOUR_HA_PASSWORD

    Member section

    1

    Role

    Primary

    IPv4 address (or IPv6 address)

    10.0.0.2

    10.0.1.2

    Hostname

    Click Use Current Device

    2

    Role

    Secondary

    IPv4 address (or IPv6 address)

    10.0.0.4

    10.0.1.4

    Interface section

    port3

    Heartbeat status

    Disable

    Virtual IP address (or Virtual IPv6 address)

    172.16.1.2/255.255.255.0

    port5

    Heartbeat status

    Enable

    port6

    Heartbeat status

    Enable

  6. Click Apply.

    The FortiMail unit enables active-passive HA mode, and, after determining that there is no other primary unit, sets its Effective role to Primary and adds the virtual IP 172.16.1.2 to port3.

  7. To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured role and Effective role. Both should be Primary.

    If the Effective role is not Primary, then the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the Effective role to that matching its configured HA mode of operation.

Configuring the secondary unit for HA operation

The following procedure describes how to prepare a FortiMail unit for HA operation as the secondary unit according to the diagram Virtual IP address for active-passive HA failover.

Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode. Also verify that you configured the primary unit as described in Configuring the primary unit for HA operation.

To configure the secondary unit for HA

  1. On the secondary unit, go to System > Network > Interface.

  2. Configure port6 to be 10.0.0.4/255.255.255.0 and port5 to be 10.0.1.4/255.255.255.0.

  3. Go to System > High Availability > Configuration.

  4. Configure the following:

    GUI item

    Value

    HA mode

    Active-Passive

    On failure

    Wait for recovery then switch to configured role

    Shared password

    YOUR_HA_PASSWORD

    Member section

    1

    Role

    Primary

    IPv4 address (or IPv6 address)

    10.0.0.2

    10.0.1.2

    2

    Role

    Secondary

    IPv4 address (or IPv6 address)

    10.0.0.4

    10.0.1.4

    Hostname

    Click Use Current Device

    Interface section

    port3

    Heartbeat status

    Disable

    Virtual IP address (or Virtual IPv6 address)

    172.16.1.2/255.255.255.0

    port5

    Heartbeat status

    Enable

    port6

    Heartbeat status

    Enable

  5. Click Apply.

    The FortiMail unit changes to active-passive HA, and, after determining that the primary unit is available, sets its Effective role to Secondary.

  6. Go to System > High Availability > Status.
  7. To confirm that the FortiMail unit is acting as the secondary unit, go to System > High Availability > Status. Compare the Configured role and Effective role. Both should be Secondary.

    If the Effective role is not Secondary, then the FortiMail unit is not acting as the secondary unit. Determine the cause of the failover, then restore the Effective role to match Configured role.

    Note

    If the heartbeat interfaces are not connected, then the secondary unit cannot connect to the primary unit and a failure will be detected. The secondary unit will change its Effective role to Primary.

Example: Failover scenarios

Once HA is configured, it starts to automatically monitor the HA group for failures.

Various causes can be detected as a failure, and depending on the On failure setting, the HA group may automatically fail over in order to maintain service availability for overall uptime.

Automatic failover can be configured for active-active HA groups, but in this example, we show active-passive HA. The following abbreviations are used:

  • P1 is the configured primary unit
  • S2 is the configured secondary unit

This topic includes:

Failover scenario 1: Temporary failure of the primary unit

In this scenario, the primary unit (P1) fails because of a software crash or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.

When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.

During this process:

  1. The FortiMail HA group is operating normally.

  2. The power cable is accidentally disconnected from P1.

  3. S2’s primary heartbeat test detects that P1 has failed.

  4. How soon this happens depends on the Heartbeat lost threshold of S2.

  5. The Effective role of S2 changes to Primary.

  6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘PRIMARY heartbeat disappeared’

    The state changed from ‘SECONDARY’ to ‘PRIMARY’

  7. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

  8. After P1 recovers from the hardware failure, what happens next depends on P1’s On failure setting.

Failover scenario 2: System reboot or reload of the primary unit

If you need to reboot or reload the configuration (not shut down) P1, such as during a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI:

  • P1 will send a command to S2 to wait for the heartbeat and service monitoring signal to resume, so that S2 will not take over the primary role during P1’s reboot.

  • P1 will also send an alert email similar to the following:

    This is the HA machine at 172.16.1.5.

    The following critical event was detected

    The system is rebooting (or reloading)!

  • S2 will wait up to 15 minutes for P1 to return. If P1 fails during the reboot, S2 will become primary.

  • S2 will send an alert email, indicating that S2 received the wait command from P1.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘peer rebooting (or reloading)’

    The state changed from ‘SECONDARY’ to ‘HOLD_OFF’

When P1 is up again:

  • P1 will send another command to S2 and ask S2 to change its Effective role from Holdoff to Secondary, and to resume monitoring P1’s services and heartbeat.

  • S2 will send an alert email, indicating that S2 received instruction commands from P1.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘peer command appeared’

    The state changed from ‘HOLD_OFF’ to ‘SECONDARY’

  • S2 logs the event in the HA logs.

Failover scenario 3: System reboot or reload of the secondary unit

If you reboot or reload the configuration of S2 such as during a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI, then the behavior of P1 and S2 is as follows:

  • P1 will send an alert email about S2, similar to the following:

    This is the HA machine at 172.16.1.5.

    The following event has occurred

    ‘ha: SECONDARY heartbeat disappeared’

  • S2 will send an alert email similar to the following:

    This is the HA machine at 172.16.1.6.

    The following critical event was detected

    The system is rebooting (or reloading)!

  • P1 will also log this event in the HA logs.

Shutdown (halt) is in the general purpose logs and alert email, but is not in alert email about HA specifically.

Failover scenario 4: Primary heartbeat link fails

If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the Effective role of the secondary unit (S2) changes to Primary, and bothFortiMail units are acting as primary units.

Two primary units connected to the same network may cause IP address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.

Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the GUI of S2 to manually return it to acting as a secondary unit.

  1. The FortiMail HA group is operating normally.
  2. The heartbeat link Ethernet cable is accidentally disconnected.
  3. S2’s HA heartbeat test detects that the primary unit has failed.
  4. How soon this happens depends on the HA daemon configuration of S2.

  5. The Effective role of S2 changes to Primary.
  6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.
  7. This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘PRIMARY heartbeat disappeared’

    The state changed from ‘SECONDARY’ to ‘PRIMARY’

  8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

Recovering from a heartbeat link failure

If a hardware failure is not permanent (for example, an temporarily disconnected cable, not a failed port on one of the FortiMail units), then you may want to return both FortiMail units to operating in their configured Role.

To return to normal roles after the heartbeat link fails

  1. Reconnect the primary heartbeat interface by reconnecting the Ethernet cable for the heartbeat link.
  2. Even though the Effective role of S2 is Primary, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1's Effective role is also Primary. So S2 sends a heartbeat signal to tell P1 to stop operating as a primary unit. The Effective role of P1 changes to Off.

  3. P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.

    This is the HA machine at 172.16.1.5

    The following event has occurred

    'SECONDARY asks us to switch roles (user requested takeover)'

    The state changed from 'PRIMARY' to 'OFF'

  4. P1 records the following event log messages (among others) indicating that P1's Effective role is changing to Off.

    2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering off mode"

    The configured Role of P1 is Primary, but the Effective role is Off.

    The configured Role of S2 is Secondary, but the Effective role is Primary.

    P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.

  5. Connect to the GUI of P1, and go to System > High Availability > Status.

  6. Look for synchronization messages.

    Do not continue to the next step until P1 has synchronized with S2.

  7. Connect to the GUI of S2, go to System > High Availability > Status, and in Action, select Restore to configured role.

    The HA group should return to normal operation. P1 records the following event log message (among others) indicating that S2 asked P1 to return to being the primary unit.

    2005-11-30 18:10:00 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: being asked to assume original role"

  8. P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.

Using high availability (HA)

Go to System > High Availability to configure the FortiMail unit to act as a member of a high availability (HA) cluster in order to increase processing capacity and/or availability, so that your deployment is still up even if some hardware fails.

This section contains the following topics:

About HA modes

FortiMail HA can operate in either:

  • active-passive mode
  • active-active mode

Active-passive HA

Active-active HA

2 FortiMail units in the HA group

2-24 FortiMail units in the HA group

Typically deployed behind a switch

Typically deployed behind a load balancer

Both configuration* and data synchronized^

Only configuration* synchronized

Only primary unit processes email

All units process email

No data loss^ when hardware fails

Data loss when hardware fails

No increased processing capacity

Increased processing capacity

* For exceptions, see Settings that are not synchronized by HA.

^ For exceptions, see Synchronization of MTA queue directories after a failover.

Active-passive HA group operating in gateway mode

only primary unit processes email in active-passive HA

Active-active HA group operating in gateway mode

all available units process email in active-active HA

When a FortiMail unit fails, current SMTP sessions are interrupted. SMTP clients usually handle this gracefully, and restart a new connection. Traffic is redirected away from the failed FortiMail by different methods that vary by HA mode

  • Active-passive: The secondary unit starts to use the virtual IP address of the failed unit, and uses ARP to automatically notify the switch or router that traffic should now be redirected to its network interface instead.
  • Active-active: The load balancer stops sending email connections to failed FortiMail units. Only live FortiMail units continue to receive connections.
Note

You can mix different FortiMail models in the same HA group. However:

  • All FortiMail units in the HA group must have the same firmware version.
  • Capacity and maximum configuration values are limited by the least powerful model.

To configure FortiMail units in an HA group, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units, so that changes made to the primary unit are propagated to the secondary units.

Exceptions include:

Note

To use FortiGuard Antivirus or FortiGuard Antispam with HA, you must license all FortiMail units in the cluster. Only licensed devices can use the subscription services.

See also

Configuring an HA group

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Storing mail data from HA groups on a NAS server

Example: Failover scenarios

Example: Active-passive HA group in gateway mode

About the HA heartbeat and synchronization

Heartbeat and synchronization through the primary and secondary heartbeat network interfaces:

Note

Synchronization intervals vary.

  • FortiGuard Antispam and FortiGuard Antivirus packages: Not synchronized.
  • Mail queue: Up to 20 minutes (not real time).
  • Configuration: Real time.

If configuration synchronization did not occur when expected, or if you have inadvertently de-synchronized the secondary unit’s configuration (for example, if a cable was accidentally disconnected), then you can manually initiate synchronization via GUI or the CLI command diagnose system ha sync on either the primary unit or the secondary unit.

Periodically, the secondary unit verifies that all configuration changes have been synchronized. If they have not, then the secondary unit will pull the configuration changes from the primary unit and reload the new configuration.

Secondary units also can push any changes made to its block and safe lists back to the primary unit. In active-active HA, these changes are then synchronized to all other secondary units.

The secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal detects failure of the primary unit, and triggers the action that you select in On failure. For details, see Example: Failover scenarios.

Exceptions include system restarts and the execute reload CLI command. If the primary unit reboots or reloads its configuration, then it signals to the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.

Behavior when the heartbeat signal is lost varies by HA mode and On failure:

  • Active-passive: The secondary unit becomes the new primary unit and starts receiving email connections. Some in-progress email connections may be interrupted and must be restarted, but most email clients and servers can gracefully handle this.

  • Active-active: If Primary backup has been selected, then your preferred backup unit will take over the role of the primary unit (Effective role becomes Primary).

    If a specific Primary backup is not selected, then each secondary unit continues to operate as a secondary unit. However, with no primary unit, changes to the configuration are not synchronized anymore.

For failover examples and steps required to restore the initially configured roles in each case, see Example: Failover scenarios.

Interface monitoring, hard drive monitoring, and remote service monitoring do not provide configuration and data synchronization, and therefore they are not a complete replacement for the heartbeat. However you can use them as another way to detect failure. See Interface section and Service Monitor section.

See also

About HA modes

About HA port numbers and protocols

About logging, alert email, and SNMP for HA

Settings that are not synchronized by HA

Storing mail data from HA groups on a NAS server

Synchronization of MTA queue directories after a failover

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

About HA port numbers and protocols

The default protocol and port numbers for HA heartbeat, synchronization, and service monitoring communications are configurable. See HA base port, the control-packet-option setting in the FortiMail CLI Reference, and Appendix C: Port Numbers.

Note

If a firewall is between the primary and secondary FortiMail unit, then verify that the firewall policy allows HA port numbers. Blocked HA ports can cause incorrect failover and synchronization failure.

Settings that are not synchronized by HA

All settings on the primary unit are synchronized to the secondary unit, except the following:

Settings

Explanation

Operation mode

You must set the operation mode (gateway, transparent, or server) of each HA group member before configuring HA. Many settings vary by operation mode, and therefore configurations cannot be synchronized if the operation mode is different.

Host name

Different host names are used to distinguish members of the HA cluster when connecting to the GUI and to indicate which unit failed. For details, see Hostname.

Static route

Static routes are not synchronized because some or all in the network interfaces on each FortiMail unit in the HA cluster may be connected to different subnets. See also Configuring static routes .

Interface configuration

(gateway and server mode only)

Administrator connections to the GUI/CLI, alert email, and many other features require that you configure at least one network interface with an IP address. For details, see Configuring the network interfaces.

Exceptions include virtual IP addresses on active-passive HA. Virtual IP addresses are synchronized because, upon failover, the secondary unit must starts to use them. This mechanism allows traffic to receive connections instead of the failed primary unit. See Virtual IP address (or Virtual IPv6 address).

Management IP address

(transparent mode only)

Each FortiMail unit in the HA cluster should be configured with different management IP addresses for GUI and CLI connectivity purposes. For details, see About the management IP.

SNMP system information

Each FortiMail unit in the HA cluster will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring SNMP queries and traps.

RAID configuration

RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

Some HA settings

Product name and icon

The product name and icon under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

Miscellaneous settings
(active-active HA only)

In active-active HA, the following settings are not synchronized:

All system, domain, and user level block/safe lists are synchronized.

Note

User data is synchronized at predefined time intervals, not in real time.

See also

About the HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

During normal operation in active-passive HA, email messages are either:

  • being received or sent by the primary FortiMail unit
  • waiting to be delivered in the mail queue
  • stored in the primary unit’s mail data directories (email quarantines, email archives, and email inboxes of server mode)

When a failure occurs, sending and receiving is interrupted. The delivery attempt fails, and the sender usually retries to send the email message. However, stored messages remain in the primary unit’s mail data directories.

To prevent data loss when a primary unit fails, you usually should enable Synchronize mail data directory (unless NAS storage is used), but do not need to enable Synchronize MTA queue directory. This is because of an automatic recovery mechanism in FortiMail HA failover.

  1. The secondary or primary backup unit detects that the primary unit has failed, and becomes the new primary unit.

  2. If the former primary unit can reboot, it detects the new primary unit, and becomes a secondary unit.

    Note

    Depending on the On failure setting, you may be required to click Restart HA on a failed primary unit.

  3. The former primary unit pushes its mail queue to the new primary unit.

    This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

  4. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

As a result, if the failed primary unit can restart, no email is lost from the mail queue.

Even if you choose to synchronize the mail queue, because its contents change very rapidly and synchronization is periodic, there is a chance that some email will not have not been synchronized when a failover occurs.

See also

About the HA heartbeat and synchronization

Storing mail data from HA groups on a NAS server

Storing mail data from HA groups on a NAS server

If you have FortiMail units operating in server mode and in an active-active HA group, you must store mail data centrally on a network attached storage (NAS) server — not on each FortiMail unit. Otherwise email users’ messages and other mail data could be scattered across multiple FortiMail units.

For other HA and operating modes, however, it still may be better to store mail data on a NAS server.

For example, regular NAS server backups help to prevent mail data loss, even if a FortiMail unit has hardware failure. Also, during a temporary failure of a FortiMail unit, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

For active-active HA with a NAS server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email and to configure their own Bayesian filters.

For active-passive HA groups, the primary unit reads and writes all mail data to and from the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit uses the same NAS server for mail data. The new primary unit can access all mail data that the original primary unit stored on the NAS server. So if you are using a NAS server to store mail data, after a failover, the new primary unit continues operating with no loss of mail data.

Note

If the FortiMail unit is a member of an active-passive HA group, and the HA group stores mail data on a remote NAS server, disable mail data synchronization to prevent duplicate mail data traffic.

For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

See also

About the HA heartbeat and synchronization

Synchronization of MTA queue directories after a failover

About logging, alert email, and SNMP for HA

For faster discovery and diagnosis of network problems that have caused an HA failover, you can configure SNMP, Syslog, and/or alert email to monitor the HA cluster.

To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to the secondary units, all FortiMail units in the HA group record their own separate log messages and send separate alert email messages. Log data is not synchronized.

Note

To distinguish alert email from each member of the HA cluster, configure a different host name for each member. For details, see Hostname.

To use SNMP to monitor HA failover, configure each cluster member to enable HA events for the SNMP community, such as:

See also

Configuring SNMP queries and traps

Logs, reports, and alerts

About the HA heartbeat and synchronization

Configuring an HA group

To deploy FortiMail units as a high availability (HA) cluster, perform the following steps in order.

To deploy an HA group

  1. Register all FortiMail units in the HA cluster with the Fortinet Technical Support web site:

    https://support.fortinet.com/

    If you use licensed features such as centralized HA monitoring, FortiGuard Antivirus, and/or FortiGuard Antispam, also purchase and register licenses for all units.

  2. Connect the network interfaces that will be used for the heartbeat and synchronization between FortiMail units in the HA cluster. At least one heartbeat link is required.

    For example, you could use a network cable to connect FortiMail A's port2 to FortiMail B's port2.

    Note

    Don't disconnect the heartbeat once HA is enabled. If the heartbeat is accidentally interrupted for an active-passive HA group, such as when a network cable is temporarily disconnected, the secondary unit will assume that the primary unit has failed, and become the new primary unit. If no failure has actually occurred, both FortiMail units will be operating as primary units at the same time. This can cause an IP address conflict. In active-active HA groups, configuration synchronization can be disrupted. For details on correcting this, see Restore to configured role.

    Caution

    For better heartbeat reliability, create two heartbeat links: a primary and a secondary. Directly link the pair of heartbeat ports with an Ethernet crossover cable, or connect them through a dedicated local switch that is not connected to your overall network. This ensures enough bandwidth and low latency for the synchronization and heartbeat. If the heartbeat is interrupted, then a failover may occur. See also About the HA heartbeat and synchronization.

  3. If you are making an active-passive HA group, and the operation mode is gateway or server, add a Virtual IP address (or Virtual IPv6 address) and Virtual hostname to the network interface that will receive email connections. Update DNS records to use this virtual IP address, not the physical IP address. Wait for the DNS records to propagate to non-authoritative DNS servers before you enable HA.
  4. If you are making an active-active HA group, configure storage of mail data on a NAS server. See Storing mail data from HA groups on a NAS server.(Active-passive members can also benefit from a NAS server, but do not require it.)

    Caution

    For active-active HA, if the FortiMail unit is operating in server mode, you must store mail data externally on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units.

  5. On each member of the HA group, go to System > High Availability > Configuration and:

    1. Configure the following:

      GUI item

      Description

      State

      Enable or disable HA.

      HA mode

      Select either Active-Active or Active-Passive. For details, see About HA modes.

      On failure

      Select what the HA group will do when it detects a failure, either:

      • Switch off immediately: On recovery, do not process email or join the HA group until you manually select the Effective role (see Restart HA and Restore to configured role).
      • Wait for recovery: On recovery, the failed primary unit’s Effective role becomes Secondary. To manually restore the FortiMail unit to acting in its configured Role, see Restore to configured role.
      • Wait for recovery and switch to configured role: On recovery, the failed primary unit's Effective role automatically becomes Primary again, and the secondary unit that was temporarily acting as primary automatically becomes Secondary again. This option may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is recurring, resulting in many extra role changes.

      Tip: In most cases, you should select Wait for recovery.

      Shared password

      Enter an HA password for the HA group members.

      Before HA group members synchronize with each other, they verify that they have the same shared password. This prevents them from accidentally synchronizing with FortiMail units that do not belong to the same cluster. Therefore you must add the shared HA password to each unit in the HA group.

    2. Expand the Member section. For each FortiMail unit in the HA group, click New and configure the following:

      GUI item

      Description

      Role

      Select the role of the FortiMail unit in the HA group, either Primary or Secondary

      Each HA group member's role is not synchronized because this distinguishes the primary and secondary units.

      Effects of the role vary by HA mode. See About HA modes.

      IPv4 address
      (or IPv6 address)

      Enter the IP address of the network interface that will listen for the heartbeat and synchronization on the primary or secondary (depending on which entry you are currently configuring in the table).

      If you want more heartbeat interfaces, click + and then add those IP addresses.

      Alternatively, if you are currently configuring the device that you are adding to the table, click Use Current Device.

      Note: You must also bring up and then enable Heartbeat status on the interface. If it is disabled, but the IP address is configured here, then HA will detect that the heartbeat link has failed.

      Hostname

      Displays the hostname of the primary or secondary (depending on which entry you are currently configuring in the table).

      Note: Do not configure the hostname here. It will not update the hostname used by the FortiMail unit's SMTP relay/proxy. Instead, configure Host name in the mail settings and Virtual hostname, and then click Use Current Device to automatically paste the hostname into this field.

      Primary backup

      (Active-active secondary units only)

      If HA mode is Active-Active, then there can be many secondary units. Enable this setting if Role is Secondary, and you want to select this member to become the new primary when a failure is detected.

      Note: Usually you should have a primary backup. Otherwise configuration synchronization will be interrupted upon failure. See About the HA heartbeat and synchronization.

      Comment

      Optional. Enter a descriptive comment.

    3. If the HA group is active-passive, configure the Virtual IP address (or Virtual IPv6 address) that will transfer upon failover.
    4. If the HA group stores mail data on NAS, disable Synchronize mail data directory.

    5. Optionally, configure:

    6. Click Apply on the primary unit, and then on the secondary units.
  6. If the HA group is active-active, configure the load balancer with either remote service monitoring or interface monitoring to detect failed FortiMail units, and to redirect connections to available FortiMail units.
  7. Monitor the status of each cluster member. For details, see Monitoring HA status, Logs, reports, and alerts, and Centrally monitoring the HA cluster.

See also

About HA modes

About the HA heartbeat and synchronization

Settings that are not synchronized by HA

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Advanced Option section

  1. Go to System > High Availability > Configuration.

  2. Expand the Advanced Option section.

  3. Configure the following and then click Apply:

    GUI item

    Description

    Synchronize mail data directory

    (Active-Passive only)

    Enable if the HA group does not store its mail data on a NAS server, in order to synchronize system quarantine, per-recipient quarantines, email archives, email users’ preferences, and (server mode only) mailboxes with the HA group members.See Storing mail data from HA groups on a NAS server.

    If mail data changes frequently, you can manually initiate a data synchronization when significant changes are complete. For details, see Start configuration sync.

    Synchronize MTA queue directory

    (Active-Passive only)

    Enable if you want to synchronize the mail queue with the HA group members.

    Caution: If the primary unit experiences a hardware failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note: Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queue.

    Note

    Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover and Managing the mail queue.

    HA base port

    Enter the first of multiple port numbers (see Appendix C: Port Numbers) that will be used for:

    • heartbeat signals
    • synchronization control
    • data synchronization
    • configuration synchronization
    Note

    For both active-active and active-passive HA, in addition or alternatively to configuring the heartbeat, you can configure service monitoring. For details, see Service Monitor section and About the HA heartbeat and synchronization.

    Note

    In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Start configuration sync.

    Heartbeat lost threshold

    Enter the total amount of time, in seconds, that a FortiMail unit can be unresponsive until and HA detects a failure and performs the action in On failure.

    Tip: The heartbeat verifies availability every1 second. To prevent unnecessary failover when the primary unit is temporarily experiencing very heavy load and therefore heartbeat responses are slow, configure a longer threshold (for example, 3 seconds or more) to allow the secondary unit enough time to send more heartbeat signals to confirm unresponsiveness. To determine the best heartbeat threshold, it is useful to know your FortiMail unit's performance baseline and peaks. See also Establish a system baseline and Troubleshoot resource issues.

    Note

    If you have service level agreements (SLA), then you may be required to keep this time short. If the failure detection time is too long, email delivery could be delayed or fail until HA detects the failure. This reduces service uptime.

    Remote services as heartbeat

    Enable to avoid the the On failure action if both the primary and secondary heartbeat links temporarily fail, but remote service monitoring detects that the FortiMail unit is still available.

    Note

    The On failure action can still occur if the HA process restarts due to system reboot or HA daemon restart. Then it examines the physical heartbeat links first. If they are not found, then failure is detected.

    This setting provides an extra HA heartbeat only, not synchronization. To avoid synchronization problems, do not use remote service monitoring as a heartbeat for a long time. This feature is intended only as a temporary heartbeat until you reestablish a normal primary or secondary heartbeat link.

Interface section

In a basic HA deployment, the heartbeat interface provides a basic signal to other HA group members about the health of the primary FortiMail unit. However, you can use an additional signals. Interface monitoring periodically tests the local network interfaces on the primary unit . If a malfunctioning interface is detected, HA performs the action configured in On failure.

  1. Optionally, configure the interface monitoring interval and failure detection threshold. See Service Monitor section.
  2. Go to System > High Availability > Configuration.

  3. Expand the Interface section.

  4. Select a row for a network interface in the table, and then click Edit.

  5. Configure the following settings:

    GUI item

    Description

    Heartbeat status

    Enable if this interface will listen for HA heartbeat and synchronization communications.

    Note

    You must enable at least one of the heartbeat interfaces that you defined in IPv4 address (or IPv6 address). Otherwise HA will detect a failure.

    Port

    Displays the name of the network interface that you are configuring.

    Optionally, you can click the name to view or configure its settings. See also Configuring the network interfaces.

    Virtual IP address (or Virtual IPv6 address)

    Enter a virtual IP address that the primary unit will have on this network interface. Upon failure detection, the secondary will become the new primary and start to use the virtual IP address.

    For gateway mode and server mode deployments, DNS records should be configured to point to the virtual IP address, not physical IP addresses.See also About HA modes, Configuring the network interfaces, About IPv6 Support.

    This setting is available only if HA mode is Active-Passive.

    Note

    The interface IP address must be different from, but on the same subnet as, the IP addresses of the other heartbeat network interfaces of other members in the HA group.

    When configuring other FortiMail units in the HA group, use this value as the:

    • Remote peer IP (for active-passive HA)
    • Primary configuration (for secondary units in active-active HA)
    • Peer systems (for the primary unit in active-active HA)

    Virtual hostname

    Enter a virtual hostname.

    Similar to behavior with the virtual IP address, the virtual hostname belongs to the current primary unit. Upon failover, the secondary unit becomes the new primary unit, and so it starts to use the virtual hostname instead.

    This setting is available only if HA mode is Active-Passive.

    Enable port monitor

    Enable to monitor a physical network port for failure. If the port fails, a failure is detected by the HA cluster.

Service Monitor section

Failed FortiMail units, in the simplest HA deployments, are detected by an interrupted heartbeat. However HA can also detect failure of hardware and network services. Heartbeats detect the general responsiveness of a primary unit, but do not test each daemon (for example, POP3 or webmail service), hard drive, and physical network ports used by non-heartbeat traffic. Therefore you can add hardware and service monitoring to be more specific. Alternatively, if the heartbeat link is briefly disconnected, remote services monitoring can prevent an unnecessary failover by temporarily acting as a secondary heartbeat.

With remote service monitoring, the secondary unit connects to the SMTP, POP3, and/or web service (HTTP) on the primary unit to detect failure. For server mode, IMAP service can also be monitored.

With local network interface monitoring and hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.Hard drive monitoring tests that the local hard drive is still accessible, and disk space exists for mail data. If the hard disk is not responsive, or if the mail data disk is 95% full, then a failure is detected.

Network interface monitoring tests all network interfaces where:

Alert email, log messages, and SNMP traps (if configured) indicate the specific cause.

For example, if service monitoring detects failure of port2 on the primary unit, it records this log message:

date=2005-11-18 time=18:20:31 device_id=FE-4002905500194 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: local problem detected (port2), shutting down"

and sends this alert email:

Subject: monitord: local problem detected (port2), shutting down [primary-host-name]

This is the FortiMail HA unit at 10.0.0.1.

A local problem (port2) has been detected, telling remote to take over and shutting down.

To configure hardware and service monitoring

  1. Go to System > High Availability > Configuration.

  2. Expand the Service Monitor section.

  3. Select a row in the table and click Edit.

    For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring for the selected service.

    Name

    Displays the service name.

    Port

    Enter the listening port number of the service on the primary FortiMail and (active-active HA only) secondary. See also Appendix C: Port Numbers.

    Timeout

    Enter the amount of time in seconds to wait for a response to the connection.

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

    For interface monitoring, configure the following and click OK (to configure which ports are monitored, see Interface section):

    GUI item

    Description

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

    For local hard drive monitoring, configure the following and click OK:

    GUI item

    Description

    Enable

    Enable or disable monitoring that the local hard drive.

    Interval

    Enter the time in seconds between each test.

    Retries

    Enter the number of consecutively failed tests that indicate a failure.

See also

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Monitoring HA status

After you configure HA (see Configuring an HA group), to view the roles and synchronization status of the HA group, go System > High Availability > Status. You can also manually initiate synchronization and reset the current Effective role to match the initial Configured role.

GUI item

Description

State

Displays the configured HA mode.

Configured role

Displays the configured Role.

In active-active HA, the secondary unit that is the primary backup (if configured) will display Secondary, like other secondary units.

After a failure has been detected, the FortiMail unit may not be acting in the role that it was initially configured for, and then this will not match Effective role. For details, see Combinations of configured and effective HA role.

Effective role

Displays the role that this FortiMail unit is currently operating in, either:

  • Primary: Acting as primary unit.
  • Secondary: Acting as secondary unit.
  • Off: For primary units, this indicates that interface or remote service monitoring has detected a failure and therefore the primary unit went offline and halted HA processes. For secondary units, this indicates that it detected an HA synchronization failure; if sync immediately fails again, then the action in On failure will occur. See also Restart HA.
  • Failed: Service monitoring or network interface monitoring has detected a failure and the diagnostic connection is currently determining if the problem has been corrected or it must perform the action in On failure.
  • Holdoff: For secondary units, this indicates that the primary unit is rebooting and asked to wait longer than the usual Heartbeat lost threshold so that the reboot can complete. If the primary does not return, then a failure is detected and it must perform the action in On failure.

After a failure has been detected, the FortiMail unit may not be acting in the role that it was initially configured for, and then this will not match Configured role. For details, see Combinations of configured and effective HA role.

For information on restoring the FortiMail unit to the initially configured role, in Action, click Restore to configured role.

Member Status

A table with some basic statuses about all FortiMail units that belong to the HA group, including:

  • SN: Serial number.

  • IP: IPv4 address (or IPv6 address) of the network interface for the primary heartbeat.

  • Version: Firmware version. A FortiMail unit must run the same firmware version in order to join the HA group, so that the configuration can be synchronized.

  • Configured: Configured role.

    In addition, if a secondary unit has been configured as the Primary Backup, it is denoted with an icon.

  • Effective: Effective role.

  • Status: Whether or not the HA cluster is synchronized.

  • Up Time: Duration of time that the HA cluster member has been operational.

  • Last Seen: When this FortiMail unit’s HA daemon last communicated with the others in the HA group to make sure that they are available. See also Heartbeat lost thresholdand HA base port.

Action

Depending on the context, one or more the following actions may be available:

  • Start configuration sync: Click to manually initiate configuration synchronization with other FortiMail units in the HA cluster. See also Settings that are not synchronized by HA.

  • Restore to configured role: Click to manually reset the Effective role to match the unit's Configured role.

  • Restart HA: If the primary unit's Effective role is Off, and then you have fixed the cause of the failure, click to restart HA processes.

See also

About the HA heartbeat and synchronization

About logging, alert email, and SNMP for HA

Configuring an HA group

Service Monitor section

Example: Failover scenarios

Combinations of configured and effective HA role

Role

Effective role

Result

Primary

Primary

Normal for the primary unit of an HA group.

Secondary

Secondary

Normal for the secondary unit of an HA group.

In active-active HA, this can also occur if the primary unit has failed. Most of the secondary units continue to be secondary. If you selected one of them to be the primary backup, however, then its Effective role becomes Primary.

Primary

Off

Either the:

  • primary unit failed, and On failure is Switch off immediately
  • FortiMail unit is starting to operate in HA mode

and its HA processes such as configuration synchronization are stopped. To return it to the originally configured role, see Recovering from a heartbeat link failure.

Note: This is caused by a stopped heartbeat, not remote service monitoring or hardware/interface monitoring.

Secondary

Off

The secondary unit has detected a failure, or the FortiMail unit is starting to operate in HA mode.

After the secondary unit starts and connects with the primary unit to form an HA group, the first configuration synchronization may fail. To prevent both the secondary and primary units from simultaneously acting as primary units, the Effective role becomes Off. If the next synchronization fails, then the secondary unit’s Effective role becomes Primary.

Primary

Failed

Remote service monitoring or local network interface monitoring on the primary unit has detected a failure.

Once the problem that caused the failure has been corrected, the Effective role changes from Failed to either Secondary or Primary, depending on the On failure setting.

Primary

Secondary

The primary unit failed. A secondary unit automatically became the new primary unit. When the failed unit restarted, it detected that there was already a primary unit in the HA group, and so now the failed unit is the new secondary unit.

If you want the failed unit to return to acting as the primary unit, in Action, you must manually select Restore to configured role.

Secondary

Primary

The secondary unit detected that the primary unit failed, and then the secondary unit became the new primary unit.

If you want it to return to acting as the secondary unit, in Action, you must manually select Restore to configured role.

See also

About the HA heartbeat and synchronization

Monitoring HA status

Configuring an HA group

Service Monitor section

Recovering from a heartbeat link failure

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Example: Active-passive HA group in gateway mode

In this example, two FortiMail units in gateway mode are configured as an active-passive HA group.

This example describes HA configuration for this scenario. Before beginning, verify that both of the FortiMail units are:

Virtual IP address for active-passive HA failover

VIP transfers to secondary upon failure of primary active-passive HA FortiMail unit

For both FortiMail units:

port1

  • connected to a switch which is connected only to administrator computers
  • administrative access is enabled only on this port

port3

  • connected to a switch which is connected to the remaining private network and, indirectly through a FortiGate, the Internet
  • email connections occur only through this port

port5

  • connected directly to the other FortiMail unit
  • heartbeat and synchronization occurs through this port

port6

  • connected directly to the other FortiMail unit
  • heartbeat and synchronization occurs through this port

When a failover occurs, the secondary unit starts to act as the new primary. Then it must receive email connections. To make this happen, you will configure Virtual IP address (or Virtual IPv6 address). Email connections are to the VIP (not the regular port3 IP address). Initially, the VIP is on the original primary unit's port3. After failover, the secondary unit becomes the new primary and starts to use the VIP on its port3 instead.

This example contains the following topics:

About standalone versus HA deployment

If you want to convert a standalone FortiMail unit to a member of an HA group, it may help to understand how HA and standalone deployments are similar and different.

For example, compare the diagram Virtual IP address for active-passive HA failover with a standalone deployment.

Example network interfaces on a standalone FortiMail

Network interface

IP address

Description

port1

192.168.1.5

Administrative connections to the FortiMail unit

port2, port4

Default

Not connected.

port3

172.16.1.2

  • Email connections to the FortiMail unit
  • Internal DNS PTR, A and AAAA records resolve to this IP address

port5

Default

Not connected.

port6

Default

Not connected.

On both, administrators connect to the IP address of port1. DNS records and email connections use the IP address of port3.

However on HA, port3 on the primary unit has an additional IP address: the virtual IP address (VIP). Instead of the regular IP address, private network DNS records and email connections point to the VIP. When the primary fails, the secondary unit becomes the new primary, and starts to use the port3 VIP. This causes the network to automatically redirect connections there.

On HA, additionally, port6 is connected. This link is used only by HA heartbeat and synchronization between the primary and secondary unit.

Configuring the DNS records and firewall

In the diagram Virtual IP address for active-passive HA failover, SMTP clients on the private network connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, however, they connect through the public network, using an IPv4 virtual IP (VIP) on the FortiGate unit. FortiGate policies allow, NAT, and route connections to another VIP on the primary FortiMail unit.

Because of NAT, the public DNS server on the Internet must not use private network IP addresses:

  • A and/or AAAA records resolve fortimail.example.com into thepublic VIP on the FortiGate unit — not the private network VIP on the FortiMail primary unit
  • PTR records to enable external email servers to use a reverse DNS query to resolve the public VIP on the FortiGate unit into fortimail.example.com
  • MX records to indicate that fortimail.example.com is the email gateway for example.com, like usual

Configuring the primary unit for HA operation

In the standalone gateway mode configuration shown in About standalone versus HA deployment, the FortiMail unit’s port3 IP address is 172.16.1.2. The FortiGate unit is configured to NAT email connections to and from that private network IP address.

To achieve the same result with an active-passive HA group, you will add a virtual IP address of 172.16.1.2 to port3 on the primary unit. Email connections occur through this virtual IP address, instead of the physical IP address. You will also add a heartbeat link between the HA members on port6.

To configure the primary unit for HA

  1. Before you start, verify that the IP address and DNS records match what is shown in Example: Active-passive HA group in gateway mode.

  2. On the primary unit, go to System > Network > Interface.

  3. Configure port6 to 10.0.0.2/255.255.255.0 and port5 to 10.0.1.2/255.255.255.0.

  4. Go to System > High Availability > Configuration.

  5. Configure the following:

    GUI item

    Value

    HA mode

    Active-Passive

    On failure

    Wait for recovery then switch to configured role

    Shared password

    YOUR_HA_PASSWORD

    Member section

    1

    Role

    Primary

    IPv4 address (or IPv6 address)

    10.0.0.2

    10.0.1.2

    Hostname

    Click Use Current Device

    2

    Role

    Secondary

    IPv4 address (or IPv6 address)

    10.0.0.4

    10.0.1.4

    Interface section

    port3

    Heartbeat status

    Disable

    Virtual IP address (or Virtual IPv6 address)

    172.16.1.2/255.255.255.0

    port5

    Heartbeat status

    Enable

    port6

    Heartbeat status

    Enable

  6. Click Apply.

    The FortiMail unit enables active-passive HA mode, and, after determining that there is no other primary unit, sets its Effective role to Primary and adds the virtual IP 172.16.1.2 to port3.

  7. To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured role and Effective role. Both should be Primary.

    If the Effective role is not Primary, then the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the Effective role to that matching its configured HA mode of operation.

Configuring the secondary unit for HA operation

The following procedure describes how to prepare a FortiMail unit for HA operation as the secondary unit according to the diagram Virtual IP address for active-passive HA failover.

Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode. Also verify that you configured the primary unit as described in Configuring the primary unit for HA operation.

To configure the secondary unit for HA

  1. On the secondary unit, go to System > Network > Interface.

  2. Configure port6 to be 10.0.0.4/255.255.255.0 and port5 to be 10.0.1.4/255.255.255.0.

  3. Go to System > High Availability > Configuration.

  4. Configure the following:

    GUI item

    Value

    HA mode

    Active-Passive

    On failure

    Wait for recovery then switch to configured role

    Shared password

    YOUR_HA_PASSWORD

    Member section

    1

    Role

    Primary

    IPv4 address (or IPv6 address)

    10.0.0.2

    10.0.1.2

    2

    Role

    Secondary

    IPv4 address (or IPv6 address)

    10.0.0.4

    10.0.1.4

    Hostname

    Click Use Current Device

    Interface section

    port3

    Heartbeat status

    Disable

    Virtual IP address (or Virtual IPv6 address)

    172.16.1.2/255.255.255.0

    port5

    Heartbeat status

    Enable

    port6

    Heartbeat status

    Enable

  5. Click Apply.

    The FortiMail unit changes to active-passive HA, and, after determining that the primary unit is available, sets its Effective role to Secondary.

  6. Go to System > High Availability > Status.
  7. To confirm that the FortiMail unit is acting as the secondary unit, go to System > High Availability > Status. Compare the Configured role and Effective role. Both should be Secondary.

    If the Effective role is not Secondary, then the FortiMail unit is not acting as the secondary unit. Determine the cause of the failover, then restore the Effective role to match Configured role.

    Note

    If the heartbeat interfaces are not connected, then the secondary unit cannot connect to the primary unit and a failure will be detected. The secondary unit will change its Effective role to Primary.

Example: Failover scenarios

Once HA is configured, it starts to automatically monitor the HA group for failures.

Various causes can be detected as a failure, and depending on the On failure setting, the HA group may automatically fail over in order to maintain service availability for overall uptime.

Automatic failover can be configured for active-active HA groups, but in this example, we show active-passive HA. The following abbreviations are used:

  • P1 is the configured primary unit
  • S2 is the configured secondary unit

This topic includes:

Failover scenario 1: Temporary failure of the primary unit

In this scenario, the primary unit (P1) fails because of a software crash or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.

When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.

During this process:

  1. The FortiMail HA group is operating normally.

  2. The power cable is accidentally disconnected from P1.

  3. S2’s primary heartbeat test detects that P1 has failed.

  4. How soon this happens depends on the Heartbeat lost threshold of S2.

  5. The Effective role of S2 changes to Primary.

  6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘PRIMARY heartbeat disappeared’

    The state changed from ‘SECONDARY’ to ‘PRIMARY’

  7. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

    2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

  8. After P1 recovers from the hardware failure, what happens next depends on P1’s On failure setting.

Failover scenario 2: System reboot or reload of the primary unit

If you need to reboot or reload the configuration (not shut down) P1, such as during a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI:

  • P1 will send a command to S2 to wait for the heartbeat and service monitoring signal to resume, so that S2 will not take over the primary role during P1’s reboot.

  • P1 will also send an alert email similar to the following:

    This is the HA machine at 172.16.1.5.

    The following critical event was detected

    The system is rebooting (or reloading)!

  • S2 will wait up to 15 minutes for P1 to return. If P1 fails during the reboot, S2 will become primary.

  • S2 will send an alert email, indicating that S2 received the wait command from P1.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘peer rebooting (or reloading)’

    The state changed from ‘SECONDARY’ to ‘HOLD_OFF’

When P1 is up again:

  • P1 will send another command to S2 and ask S2 to change its Effective role from Holdoff to Secondary, and to resume monitoring P1’s services and heartbeat.

  • S2 will send an alert email, indicating that S2 received instruction commands from P1.

    This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘peer command appeared’

    The state changed from ‘HOLD_OFF’ to ‘SECONDARY’

  • S2 logs the event in the HA logs.

Failover scenario 3: System reboot or reload of the secondary unit

If you reboot or reload the configuration of S2 such as during a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI, then the behavior of P1 and S2 is as follows:

  • P1 will send an alert email about S2, similar to the following:

    This is the HA machine at 172.16.1.5.

    The following event has occurred

    ‘ha: SECONDARY heartbeat disappeared’

  • S2 will send an alert email similar to the following:

    This is the HA machine at 172.16.1.6.

    The following critical event was detected

    The system is rebooting (or reloading)!

  • P1 will also log this event in the HA logs.

Shutdown (halt) is in the general purpose logs and alert email, but is not in alert email about HA specifically.

Failover scenario 4: Primary heartbeat link fails

If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the Effective role of the secondary unit (S2) changes to Primary, and bothFortiMail units are acting as primary units.

Two primary units connected to the same network may cause IP address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.

Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the GUI of S2 to manually return it to acting as a secondary unit.

  1. The FortiMail HA group is operating normally.
  2. The heartbeat link Ethernet cable is accidentally disconnected.
  3. S2’s HA heartbeat test detects that the primary unit has failed.
  4. How soon this happens depends on the HA daemon configuration of S2.

  5. The Effective role of S2 changes to Primary.
  6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.
  7. This is the HA machine at 172.16.1.6.

    The following event has occurred

    ‘PRIMARY heartbeat disappeared’

    The state changed from ‘SECONDARY’ to ‘PRIMARY’

  8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is changing its Effective role to Primary.

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

    2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

Recovering from a heartbeat link failure

If a hardware failure is not permanent (for example, an temporarily disconnected cable, not a failed port on one of the FortiMail units), then you may want to return both FortiMail units to operating in their configured Role.

To return to normal roles after the heartbeat link fails

  1. Reconnect the primary heartbeat interface by reconnecting the Ethernet cable for the heartbeat link.
  2. Even though the Effective role of S2 is Primary, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1's Effective role is also Primary. So S2 sends a heartbeat signal to tell P1 to stop operating as a primary unit. The Effective role of P1 changes to Off.

  3. P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.

    This is the HA machine at 172.16.1.5

    The following event has occurred

    'SECONDARY asks us to switch roles (user requested takeover)'

    The state changed from 'PRIMARY' to 'OFF'

  4. P1 records the following event log messages (among others) indicating that P1's Effective role is changing to Off.

    2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

    2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering off mode"

    The configured Role of P1 is Primary, but the Effective role is Off.

    The configured Role of S2 is Secondary, but the Effective role is Primary.

    P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.

  5. Connect to the GUI of P1, and go to System > High Availability > Status.

  6. Look for synchronization messages.

    Do not continue to the next step until P1 has synchronized with S2.

  7. Connect to the GUI of S2, go to System > High Availability > Status, and in Action, select Restore to configured role.

    The HA group should return to normal operation. P1 records the following event log message (among others) indicating that S2 asked P1 to return to being the primary unit.

    2005-11-30 18:10:00 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: being asked to assume original role"

  8. P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.