Fortinet white logo
Fortinet white logo

Administration Guide

Using high availability (HA)

Using high availability (HA)

Go to System > High Availability to configure the FortiMail unit to act as a member of a high availability (HA) cluster in order to increase processing capacity or availability.

For the general procedure of how to enable and configure HA, see How to use HA.

This section contains the following topics:

About high availability

FortiMail units can operate in one of two HA modes, active-passive or config-only.

Comparison of HA modes

Active-passive HA

Config-only HA

2 FortiMail units in the HA group

2-25 FortiMail units in the HA group

Typically deployed behind a switch

Typically deployed behind a load balancer

Both configuration* and data synchronized

Only configuration* synchronized

Only primary unit processes email

All units process email

No data loss when hardware fails

Data loss when hardware fails

Failover protection, but no increased processing capacity

Increased processing capacity, but no failover protection

* For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.

Active-passive HA group operating in gateway mode

Config-only HA group operating in gateway mode

Note

If the config-only HA group is installed behind a load balancer, the load balancer stops sending email to failed FortiMail units. All sessions being processed by the failed FortiMail unit must be restarted and will be re-directed by the load balancer to other FortiMail units in the config-only HA group.

You can mix different FortiMail models in the same HA group. However, all units in the HA group must have the same firmware version.

Note

When mixing FortiMail models, the HA group is limited by the capacity and configuration limits of the least powerful model.

Communications between HA cluster members occur through the heartbeat and synchronization connection. For details, see About the heartbeat and synchronization.

To configure FortiMail units operating in HA mode, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units, so that changes made to the primary unit are propagated to the secondary units. The web-based manager of the backup unit may display “SECONDARY MODE” as a reminder that most configuration changes cannot be made through the backup unit, but instead must be made through the primary unit. For details, see “Banner” on page 35.

Exceptions to this rule include connecting to a secondary unit in order to view log messages recorded about the secondary unit itself on its own hard disk, and connecting to a secondary unit to configure settings that are not synchronized. For details, see Configuration settings that are not synchronized.

Note

To use FortiGuard Antivirus or FortiGuard Antispam with HA, license all FortiMail units in the cluster. If you license only the primary unit in an active-passive HA group, after a failover, the secondary unit cannot connect to the FortiGuard Antispam service. For FortiMail units in a config-only HA group, only the licensed unit can use the subscription services.

For instructions of how to enable and configure HA, see How to use HA.

See also

How to use HA

About the heartbeat and synchronization

About logging, alert email and SNMP in HA

Storing mail data on a NAS server

Example: Failover scenarios

Example: Active-passive HA group in gateway mode

About the heartbeat and synchronization

Heartbeat and synchronization traffic consists of TCP packets transmitted between the FortiMail units in the HA group through the primary and secondary heartbeat interfaces.

Note

Service monitoring traffic can also, for short periods, be used as a heartbeat. For details, see Remote services as heartbeat.

Heartbeat and synchronization traffic has three primary functions:

  • to monitor the responsiveness of the HA group members
  • to synchronize configuration changes from the primary unit to the secondary units
  • For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.

  • to synchronize mail data from the primary unit to the secondary unit (active-passive only)
  • Mail data consists of the FortiMail system mail directory, user home directories, and mail queue.

Note

FortiGuard Antispam packages and FortiGuard Antivirus engines and definitions are not synchronized between primary and secondary units.

When the primary unit’s configuration changes, it immediately synchronizes the change to the secondary unit (or, in a config-only HA group, to the peer units) through the primary heartbeat interface. If this fails, or if you have inadvertently de-synchronized the secondary unit’s configuration, you can manually initiate synchronization. For details, see Using high availability (HA). You can also use the CLI command diagnose system ha sync on either the primary unit or the secondary unit to manually synchronize the configuration. For details, see the FortiMail CLI Reference.

During normal operation, the secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal interrupts the HA group, and, if it is active-passive in style, generally triggers a failover. For details, see Failover scenario 1: Temporary failure of the primary unit.

Exceptions include system restarts and the execute reload CLI command. In case of a system reboot or reload of the primary unit, the primary unit signals the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.

Periodically, the secondary unit checks with the primary unit to see if there are any configuration changes on the primary unit. If there are configuration changes, the secondary unit will pull the configuration changes from the primary unit, generate a new configuration, and reload the new configuration. In this case, both the primary and secondary units send alert email. For details, see Failover scenario 3: System reboot or reload of the secondary unit.

Behavior varies by your HA mode when the heartbeat fails:

  • Active-passive HA

A new primary unit is elected: the secondary unit becomes the new primary unit and assumes the duty of processing of email. During the failover, no mail data or configuration changes are lost, but some in-progress email deliveries may be interrupted. These interrupted deliveries may need to be restarted, but most email clients and servers can gracefully handle this. Additional failover behaviors may be configured. For details, see On failure.

Note

Maintain the heartbeat connection. If the heartbeat is accidentally interrupted for an active-passive HA group, such as when a network cable is temporarily disconnected, the secondary unit will assume that the primary unit has failed, and become the new primary unit. If no failure has actually occurred, both FortiMail units will be operating as primary units simultaneously. For details on correcting this, see Using high availability (HA).

  • Config-only HA

Each secondary unit continues to operate normally. However, with no primary unit, changes to the configuration are no longer synchronized. You must manually configure one of the secondary units to operate as the primary unit, synchronizing its changes to the remaining secondary units.

For failover examples and steps required to restore normal operation of the HA group in each case, see Example: Failover scenarios.

HA default ports and protocols

The following default ports are used for HA heartbeat and synchronization. In case you have a firewall in between the primary and secondary units, make sure the following ports are allowed in your firewall policies:

UDP/20000

Base port for HA heartbeat signal

UDP/20001

Synchronization control

TCP/20002

File synchronization

TCP/20003

Data synchronization

TCP/20004

Checksum synchronization

TCP/25

HA service monitoring - remote SMTP

TCP/80

HA service monitoring - remote HTTP

TCP/110

HA service monitoring - remote POP3

TCP/143

HA service monitoring - remote IMAP

See also

Configuration settings that are not synchronized

Synchronization of MTA queue directories after a failover

About high availability

About logging, alert email and SNMP in HA

Storing mail data on a NAS server

Configuring the HA mode and group

Configuring service-based failover

Example: Active-passive HA group in gateway mode

Example: Failover scenarios

Configuration settings that are not synchronized

All configuration settings on the primary unit are synchronized to the secondary unit, except the following:

HA settings not synchronized

Operation mode

You must set the operation mode (gateway, transparent, or server) of each HA group member before configuring HA.

Host name

The host name distinguishes members of the cluster. For details, see Host name.

Static route

Static routes are not synchronized because the HA units may be in different networks (see Configuring static routes ).

Interface configuration

(gateway and server mode only)

Each FortiMail unit in the HA group must be configured with different network interface settings for connectivity purposes. For details, see Configuring the network interfaces.

Exceptions include some active-passive HA settings which affect the interface configuration for failover purposes. These settings are synchronized. For details, see Virtual IP Address.

Management IP address

(transparent mode only)

Each FortiMail unit in the HA group should be configured with different management IP addresses for connectivity purposes. For details, see About the management IP.

SNMP system information

Each FortiMail unit in the HA group will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring the network interfaces.

RAID configuration

RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

Main HA configuration

The main HA configuration, which includes the HA mode of operation (such as primary or secondary), is not synchronized because this configuration must be different on the primary and secondary units. For details, see Configuring the HA mode and group.

HA Daemon configuration

The following HA daemon settings are not synchronized:

  • Shared password
  • Backup mail data directories
  • Backup MTA queue directories

You must add the shared HA password to each unit in the HA group. All units in the HA group must use the same shared password to identify the group.

Since the mail data and MTA queue backup settings are not synchronized, to use this feature, you must enable it on both the primary and secondary units.

Synchronized HA daemon options that are active-passive HA settings affect how often the secondary unit tests the primary unit and how the secondary unit synchronizes configuration and mail data. Because HA daemon settings on the secondary unit control how the HA daemon operates, in a functioning HA group you would change the HA daemon configuration on the secondary unit to change how the HA daemon operates. The HA daemon settings on the primary unit do not affect the operation of the HA daemon.

HA service monitoring configuration

In active-passive HA, the HA service monitoring configuration is not synchronized. The remote service monitoring configuration on the secondary unit controls how the secondary unit checks the operation of the primary unit. The local services configuration on the primary unit controls how the primary unit tests the operation of the primary unit. For details, see Configuring service-based failover.

Note: You might want to have a different service monitoring configuration on the primary and secondary units. For example, after a failover you may not want service monitoring to operate until you have fixed the problems that caused the failover and have restarted normal operation of the HA group.

Product name and icon

The product names and icons under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

Config-only HA

In config-only HA, the following settings are not synchronized:

  • the local domain name
  • default certificate
  • iSCSI initiator name
  • iSCSI ID for remote storage
  • SNMP settings
  • IP pools (see Configuring IP pools)
  • the quarantine report host name (see Web release host name/IP)
  • IBE settings of base URL, Help content URL, and About content URL
  • Centralized quarantine client IP address
  • Centralized IBE client IP address
  • Starting from 5.4.0 release, all system, domain, and user level block/safe lists are synchronized. Before 5.4.0 release, user-level block/safe lists are not synchronized. But system and domain-level block/safe lists are synchronized. Before v5.0.2 release, domain-level block/safe lists are not automatically synchronized either.
  • Note

    Note that user data is synchronized at predefined time intervals, not in real time.

See also

About the heartbeat and synchronization

Synchronization of MTA queue directories after a failover

During normal operation, email messages are in one of three states:

  • being received or sent by the primary unit
  • waiting to be delivered in the mail queue
  • stored on the primary unit’s mail data directories (email quarantines, email archives, and email inboxes of server mode)

When normal operation of an active-passive HA group is interrupted and a failover occurs, sending and receiving is interrupted. The delivery attempt fails, and the sender usually retries to send the email message. However, stored messages remain in the primary unit’s mail data directories.

You usually should configure HA to synchronize the stored mail data to prevent loss of email messages, but you usually will not want to regularly synchronize the mail queue. This is because, to prevent loss of email messages in the failed primary unit, FortiMail units in active-passive HA use the following failover mechanism:

Note

If the failed primary unit effective HA operating mode is failed, a sequence similar to the following occurs automatically when the problem that caused the failure is corrected.

  1. The secondary unit detects the failure of the primary unit, and becomes the new primary unit.
  2. The former primary unit restarts, detects the new primary unit, and becomes a secondary unit.
  3. Note

    You may have to manually restart the failed primary unit.

  4. The former primary unit pushes its mail queue to the new primary unit.
  5. This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

  6. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

As a result, as long as the failed primary unit can restart, no email is lost from the mail queue.

Even if you choose to synchronize the mail queue, because its contents change very rapidly and synchronization is periodic, there is a chance that some email in these directories will not be synchronized at the exact moment a failover occurs.

See also

About the heartbeat and synchronization

About logging, alert email and SNMP in HA

To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to the secondary units, all FortiMail units in the HA group record their own separate log messages and send separate alert email messages. Log data is not synchronized. For details on configuring logging and viewing log messages, see Logs, reports and alerts.

Note

To distinguish alert email from each member of the HA cluster, configure a different host name for each member. For details, see Host name.

To use SNMP, configure each cluster member separately and enable HA events for the community. If you enable SNMP for all units, they can all send SNMP traps. Additionally, you can use an SNMP server to monitor the primary and secondary units for HA settings, such as the HA configured and effective mode of operation. For details on SNMP, see Configuring the network interfaces.

Note

To aid in quick discovery and diagnosis of network problems, consider configuring SNMP, Syslog, and/or alert email to monitor the HA cluster for failover messages.

See also

Getting HA information using SNMP

About the heartbeat and synchronization

Getting HA information using SNMP

You can use an SNMP manager to get information about how FortiMail HA is operating. The FortiMail MIB (fortimail.mib) and the FortiMail trap MIB (fortimail.trap.mib) include the HA fields listed below.

FortiMail MIB fields

MIB Field

Description

fortimail.mib

fmlHAEventId

Provides the ID of the most recent HA event.

fmlHAUnitIp

Provides the IP address of the port1 interface of the FortiMail unit on which an HA event occurred.

fmlHAEventReason

Provides the description of the reason for the HA event.

fmlHAMode

Provides the HA configured mode of operation that you configured the FortiMail unit to operate in (either as primary or secondary).

fmlHAEffectiveMode

Provides the effective HA mode of operation (applies to active-passive HA only), either as the primary unit or as the secondary unit. The effective HA mode of operation matches the configured mode of operation unless a failure has occurred.

fortimail.trap.mib

fmlTrapHAEvent

Provides the FortiMail HA trap that is sent when an HA event occurs. This trap includes the contents of the fmlSysSerial, fmlHAEventId, fmlHAUnitIp, and fmlHAEventReason MIB fields.

How to use HA

In general, to enable and configure HA, you should perform the following:

  1. If the HA cluster will use FortiGuard Antivirus and/or FortiGuard Antispam services, license all FortiMail units in the HA group for the FortiGuard Antispam and FortiGuard Antivirus services, and register them with the Fortinet Technical Support web site, https://support.fortinet.com/.
  2. Physically connect the FortiMail units that will be members of the HA cluster.
  3. You must connect at least one of their network interfaces for heartbeat and synchronization traffic between members of the cluster. For reliability reasons, Fortinet recommends that you connect both a primary and a secondary heartbeat interface, and that they be connected directly or through a dedicated switch that is not connected to your overall network.

  4. For config-only clusters, configure each member of the cluster to store mail data on a NAS server that supports NFS connections (active-passive groups may also use a NAS server, but do not require it). For details, see Selecting the mail data storage location.
  5. On each member of the cluster:
  • Enable the HA mode that you want to use (either active-passive or config-only) and select whether the individual member will act as a primary unit or secondary unit within the cluster. For information about the differences between the HA modes, see About high availability.
  • Configure the local IP addresses of the primary and secondary heartbeat and synchronization network interfaces.
  • For active-passive clusters, configure the behavior on failover, and how the network interfaces should be configured for whichever FortiMail unit is currently acting as the primary unit. Additionally, if the FortiMail units store mail data on a NAS, disable mail data synchronization between members.
  • For config-only clusters, if the FortiMail unit is a primary unit, configure the IP addresses of its secondary units; if the FortiMail unit is a secondary unit, configure the IP address of its primary unit.

For details, see Configuring the HA mode and group.

  • If the HA cluster is active-passive and you want to trigger failover when hardware or a service fails, even if the heartbeat connection is still functioning, configure service monitoring. For details, see Configuring service-based failover.
  • Monitor the status of each cluster member. For details, see Monitoring the HA status. To monitor HA events through log messages and/or alert email, you must first enable logging of HA activity events. For details, see Logs, reports and alerts.
  • See also

    About the heartbeat and synchronization

    Centrally monitoring the HA cluster

    Monitoring the HA status

    The Status tab in the High Availability submenu shows the configured HA mode of operation of a FortiMail unit in an HA group. You can also manually initiate synchronization and reset the HA mode of operation. A reset may be required if a FortiMail unit’s effective HA mode of operation differs from its configured HA mode of operation, such as after a failover when a configured primary unit is currently acting as a secondary unit.

    For FortiMail units operating as secondary units, the Status tab also lets you view the status and schedule of the HA synchronization daemon.

    Appearance of the Status tab varies by:

    • whether the HA group is active-passive or config-only
    • whether the FortiMail unit is configured as a primary unit or secondary unit
    • whether a failover has occurred (active-passive only)

    If HA is disabled, this tab displays:

    HA mode is currently disabled

    Before you can use the Status tab, you must first enable and configure HA. For details, see Configuring the HA mode and group.

    To view the HA mode of operation status, go System > High Availability > Status.

    Viewing HA status

    GUI item

    Description

    Configured Operating Mode

    Displays the HA operating mode that you configured, either:

    • Primary: Configured to be the primary unit of an active-passive group.
    • Secondary: Configured to be the secondary unit of an active-passive group.
    • Config-primary: Configured to be the primary unit of a config-only group.
    • Config-secondary: Configured to be a secondary unit of a config-only group.

    For information on configuring the HA operating mode, see HA mode.

    After a failure, the FortiMail unit may not be acting in its configured HA operating mode. For details, see Using high availability (HA).

    Effective Operating Mode

    Displays the mode that the unit is currently operating in, either:

    • Primary: Acting as primary unit.
    • Secondary: Acting as secondary unit.
    • Off: For primary units, this indicates that service/interface monitoring has detected a failure and has taken the primary unit offline, triggering failover. For secondary units, this indicates that synchronization has failed once; a subsequent failure will trigger failover. For details, see On failure and Using high availability (HA).
    • Failed: Service/network interface monitoring has detected a failure and the diagnostic connection is currently determining whether the problem has been corrected or failover is required. For details, see On failure.

    The configured HA operating mode matches the effective operating mode unless a failure has occurred.

    For example, after a failover, a FortiMail unit configured to operate as a secondary unit could be acting as a primary unit.

    For explanations of combinations of configured and effective HA modes of operation, see Monitoring the HA status.For information on restoring the FortiMail unit to an effective HA operating mode that matches the configured operating mode, see Using high availability (HA).

    This option appears only if the FortiMail unit is a member of an active-passive HA group.

    Detail Status

    This table is viewable, when HA is configured, by all HA units (primary/secondary, and config-primary/config-secondary):

    • IP: IP address of HA cluster members.

    • SN: Serial number of HA cluster member.

    • Secondary: Displays the configuration synchronization status of the secondary/config-secondary unit.

    • Primary: Displays the configuration synchronization status of the primary/config-primary unit.

    • Status: Displays whether or not the HA cluster is synchronized.

    • Last Seen: Displays the last time the primary unit’s HA daemon checked to make sure that the secondary unit is operating correctly.

    • Monitoring occurs through the heartbeat link between the primary and secondary units.

      For details, see HA base port.

    Action

    Displays the actions you can take, depending on the context:

    • Start configuration sync: Click to manually initiate synchronization of the configurations. For information on items that are not synchronized, see Configuration settings that are not synchronized.

    • Switch to secondary/primary mode: Option depends on HA unit's role; click to manually switch the effective HA operating mode of the primary unit so that it becomes a secondary unit, or vice versa.

    • Restart the HA system: Click to restart HA processes after they have been halted due to detection of a failure by service monitoring. For details, see On failure, Configuring service-based failover, and Restarting the HA processes on a stopped primary unit.

      This option appears only if the FortiMail unit is configured to operate as the primary unit, but its effective HA operating mode is off.

    Combinations of configured and effective HA modes of operation

    Configured operating mode

    Effective operating mode

    Description

    Primary

    Primary

    Normal for the primary unit of an active-passive HA group.

    Secondary

    Secondary

    Normal for the secondary unit of an active-passive HA group.

    Primary

    Off

    The primary unit has experienced a failure, or the FortiMail unit is in the process of switching to operating in HA mode.

    HA processes and email processing are stopped.

    Secondary

    Off

    The secondary unit has detected a failure, or the FortiMail unit is in the process of switching to operating in HA mode.

    After the secondary unit starts up and connects with the primary unit to form an HA group, the first configuration synchronization may fail in special circumstances. To prevent both the secondary and primary units from simultaneously acting as primary units, the effective HA mode of operation becomes off.

    If subsequent synchronization fails, the secondary unit’s effective HA mode of operation becomes primary.

    Primary

    Failed

    The remote service monitoring or local network interface monitoring on the primary unit has detected a failure, and will attempt to connect to the other FortiMail unit. If the problem that caused the failure has been corrected, the effective HA mode of operation switches from failed to secondary, or to match the configured HA mode of operation, depending on the On failure setting.

    Additionally, f the HA group is operating in transparent mode, and if the effective HA mode of operation changes to failed, the network interface IP/netmask on the secondary unit displays bridging (waiting for recovery). For details, see Configuring the network interfaces.

    Primary

    Secondary

    The primary unit has experienced a failure but then returned to operation. When the failure occurred, the unit configured to be the secondary unit became the primary unit. When the unit configured to be the primary unit restarted, it detected the new primary unit and so switched to operating as the secondary unit.

    Secondary

    Primary

    The secondary unit has detected that the FortiMail unit configured to be the primary unit failed. When the failure occurred, the unit configured to be the secondary unit became the primary unit.

    Config primary

    N/A

    Normal for the primary unit of a config-only HA group.

    Config secondary

    N/A

    Normal for the secondary unit of a config-only HA group.

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring the HA mode and group

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Restarting the HA processes on a stopped primary unit

    If you configured service monitoring on an active-passive HA group (see Configuring service-based failover) and either the primary unit or the secondary unit detects a service failure on the primary unit, the primary unit changes its effective HA mode of operation to off, stops processing email, and halts all of its HA processes.

    After resolving the problem that caused the failure, you can use the following steps to restart the HA processes on the primary unit.

    In this example, resolving this problem could be as simple as reconnecting the cable to the port2 network interface. Once the problem is resolved, use the following steps to restart the stopped primary unit.

    To restart a stopped primary unit
    1. Log in to the web-based manager of the primary unit.
    2. Go to System > High Availability > Status.
    3. Under Action, click Restart the HA system.
    4. The primary unit restarts and rejoins the HA group.

    If a failover has occurred due to processes being stopped on the primary unit, and the secondary unit is currently acting as the primary unit, you can restore the primary and secondary units to acting in their configured roles. For details, see Using high availability (HA).

    See also

    Monitoring the HA status

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Configuring the HA mode and group

    The Configuration tab in the System > High Availability submenu lets you configure the high availability (HA) options, including:

    • enabling HA
    • selecting whether the HA group is active-passive or config-only in style
    • whether this individual FortiMail unit will act as a primary unit or a secondary unit in the cluster
    • network interfaces that will be used for heartbeat and synchronization
    • service monitor
    Caution

    For config-only HA, if the FortiMail unit is operating in server mode, you must store mail data externally, on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units. For details on configuring NAS, see Storing mail data on a NAS server and Selecting the mail data storage location.

    For an explanation of active-passive and config-only, see About high availability.

    HA settings, with the exception of Virtual IP Address settings, are not synchronized and must be configured separately on each primary and secondary unit.

    You must maintain the physical link between the heartbeat and synchronization network interfaces. These connections enable cluster members to detect the responsiveness of other members, and to synchronize data. If they are interrupted, normal operation will be interrupted and, for active-passive HA groups, a failover will occur. For more information on heartbeat and synchronization, see About the heartbeat and synchronization.

    For an active-passive HA group, or a config-only HA group consisting of only two FortiMail units, directly connect the heartbeat network interfaces using a crossover Ethernet cable. For a config-only HA group consisting of more than two FortiMail units, connect the heartbeat network interfaces through a switch, and do not connect this switch to your overall network.

    To configure HA options
    1. Go to System > High Availability > Configuration.
    2. The appearance of sections and the options in them options vary greatly with your choice in the Mode of operation drop-down-list.

    3. Configure the following sections, as applicable:
  • Click Apply.
  • Configuring the primary HA options

    Go to System > High Availability > Configuration and click the arrow to expand the HA configuration section, if needed. The options presented vary greatly depending on your choice in the Mode of operation drop-down-list.

    HA main options

    GUI item

    Description

    HA mode

    Enables or disables HA, selects active-passive or config-only HA, and selects the initial configured role this FortiMail unit in the HA group.

    • Off: The FortiMail unit is not operating in HA mode.
    • Primary: The FortiMail unit is the primary unit in an active-passive HA group.
    • Secondary: The FortiMail unit is the secondary unit in an active-passive HA group.
    • Config-primary: The FortiMail unit is the primary unit in a config-only HA group.
    • Config-secondary: The FortiMail unit is a secondary unit in a config-only HA group.

    On failure

    Select one of the following behaviors of the primary unit when it detects a failure, such as on a power failure or from service/interface monitoring.

    • switch off: Do not process email or join the HA group until you manually select the effective operating mode (see Using high availability (HA) and Using high availability (HA)).
    • wait for recovery then restore original role: On recovery, the failed primary unit‘s effective HA mode of operation resumes its configured primary role. This also means that the secondary unit needs to give back the primary role to the primary unit. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
    • wait for recovery then restore secondary role: On recovery, the failed primary unit’s effective HA mode of operation becomes secondary, and the secondary unit continue to assume the primary role. The primary unit then synchronizes the content of its MTA queue directories with the current primary unit. The new primary unit can then deliver email that existed in the former primary unit’s MTA queue at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see Using high availability (HA).

    In most cases, you should select the wait for recovery then restore secondary role option.

    This option appears only if HA mode is primary.

    Shared password

    Enter an HA password for the HA group. You must configure the same Shared password value on both the primary and secondary units.

    Enable centralized monitor

    Enable or disable the central statistics service.

    Once enabled, administrators on the primary HA unit can monitor the state and activity of each HA cluster member, including CPU, memory, and disk usage, email throughput, and other statistic summaries.

    This feature can also be enabled in the CLI by enabling central-statistics under config system ha. For more information, see the FortiMail CLI Reference.

    For more information, see Centrally monitoring the HA cluster.

    Configuring the primary configuration IP

    If you are configuring the unit as the secondary unit in a config-only group, go to System > High Availability > Configuration to configure the primary IP address.

    In the Primary IP address field, enter the IP of the primary heartbeat network interface of the primary unit. The secondary unit synchronizes only with this primary unit’s IP address.

    Configuring the advanced options

    Go to System > High Availability > Configuration to configure the advanced options.

    For config-only groups, just the HA base port option appears.

    HA advanced options

    GUI item

    Description

    Synchronize mail data directory

    Synchronize system quarantine, email archives, email users’ mailboxes (server mode only), preferences, and per-recipient quarantines.

    Unless the HA cluster stores its mail data on a NAS server, you should configure the HA cluster to synchronize mail directories.

    If mail data changes frequently, you can manually initiate a data synchronization when significant changes are complete. For details, see Using high availability (HA).

    Synchronize MTA queue directory

    Synchronize the mail queue of the FortiMail unit. For more information on the mail queue, see Managing the mail queue.

    Caution: If the primary unit experiences a hardware failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note: Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover.

    HA base port

    Enter the first of four TCP port numbers that will be used for:

    • the heartbeat signal
    • synchronization control
    • data synchronization
    • configuration synchronization

    Note: For active-passive groups, in addition or alternatively to configuring the heartbeat, you can configure service monitoring. For details, see Configuring service-based failover.

    Note: In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Using high availability (HA).

    Heartbeat lost threshold

    Enter the total span of time, in seconds, for which the primary unit can be unresponsive before it triggers a failover and the secondary unit assumes the role of the primary unit.

    The heartbeat will continue to check for availability once per second. To prevent premature failover when the primary unit is simply experiencing very heavy load, configure a total threshold of three (3) seconds or more to allow the secondary unit enough time to confirm unresponsiveness by sending additional heartbeat signals.

    Note: If the failure detection time is too short, the secondary unit may falsely detect a failure when during periods of high load.

    Caution: If the failure detection time is too long the primary unit could fail and a delay in detecting the failure could mean that email is delayed or lost. Decrease the failure detection time if email is delayed or lost because of an HA failover.

    Remote services as heartbeat

    Enable to use remote service monitoring as a secondary HA heartbeat. If enabled and both the primary and secondary heartbeat links fail or become disconnected, if remote service monitoring still detects that the primary unit is available, a failover will not occur.

    Note: The remote service check is only applicable for temporary heartbeat link fails. If the HA process restarts due to system reboot or HA daemon reboot, physical heartbeat connections will be checked first. If the physical connections are not found, the remote service monitoring does not take effect anymore.

    Note: Using remote services as heartbeat provides HA heartbeat only, not synchronization. To avoid synchronization problems, you should not use remote service monitoring as a heartbeat for extended periods. This feature is intended only as a temporary heartbeat solution that operates until you reestablish a normal primary or secondary heartbeat link.

    Configuring the secondary system options

    This section appears only when the mode of operation is set to config-primary under System > High Availability > Configuration.

    HA peer options

    GUI item

    Description

    IP address

    Double-click in order to modify, then enter the IP address of the primary network interface on that secondary unit.

    Create

    Click to add a secondary unit to the list of Peer systems, then double-click its IP address.

    The primary unit synchronizes only with secondary units in the list of Peer systems.

    Delete

    Click the row corresponding to a peer IP address, then click this button to remove that secondary unit from the HA group.

    See also

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Storing mail data on a NAS server

    For FortiMail units operating in server mode as a config-only HA group, you must store mail data on a NAS server instead of locally. If mail data is stored locally, email users’ messages and other mail data could be scattered across multiple FortiMail units.

    Even if your FortiMail units are not operating in server mode with config-only HA, however, storing mail data on a NAS server may have a number of benefits for your organization. For example, backing up your NAS server regularly can help prevent loss of mail data. Also, if your FortiMail unit experiences a temporary failure, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

    For config-only HA groups using a network attached storage (NAS) server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email and to configure their own Bayesian filters.

    For a active-passive HA groups, the primary unit reads and writes all mail data to and from the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit uses the same NAS server for mail data. The new primary unit can access all mail data that the original primary unit stored on the NAS server. So if you are using a NAS server to store mail data, after a failover, the new primary unit continues operating with no loss of mail data.

    Note

    If the FortiMail unit is a member of an active-passive HA group, and the HA group stores mail data on a remote NAS server, disable mail data synchronization to prevent duplicate mail data traffic.

    For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

    See also

    About the heartbeat and synchronization

    Configuring the HA mode and group

    Configuring interface monitoring

    In active-passive HA mode, Interface monitor checks the local interfaces on the primary unit. If a malfunctioning interface is detected, a failover will be triggered.

    To configure interface monitoring
    1. Go to System > High Availability > Configuration.
    2. Select primary or secondary as the mode of operation.
    3. Expand the Interface area, if required.
    4. Click on the port/interface name to configure the interface. For details, see Configuring the network interfaces.
    5. Note

      The interface IP address must be different from, but on the same subnet as, the IP addresses of the other heartbeat network interfaces of other members in the HA group.

      When configuring other FortiMail units in the HA group, use this value as the:

      • Remote peer IP (for active-passive groups)
      • Primary configuration (for secondary units in config-only groups)

      Peer systems (for the primary unit on config-only groups)

    6. Select a row in the table and click Edit to configure the following HA settings on the interface.

    GUI item

    Description

    Port

    Displays the interface name you’re configuring.

    Enable port monitor

    Enable to monitor a network interface for failure. If the port fails, the primary unit will trigger a failover.

    Heartbeat status

    Specify if this interface will be used for HA heartbeat and synchronization.

    • Disable

    Do not use this interface for HA heartbeat and synchronization.

    • Primary

    Select the primary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

    This network interface must be connected directly or through a switch to the Primary heartbeat network interface of other members in the HA group.

    • Secondary

    Select the secondary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

    The secondary heartbeat interface is the backup heartbeat link between the units in the HA group. If the primary heartbeat link is functioning, the secondary heartbeat link is used for the HA heartbeat. If the primary heartbeat link fails, the secondary link is used for the HA heartbeat and for HA synchronization.

    This network interface must be connected directly or through a switch to the Secondary heartbeat network interfaces of other members in the HA group.

    Caution: Using the same network interface for both HA synchronization/heartbeat traffic and other network traffic could result in issues with heartbeat and synchronization during times of high traffic load, and is not recommended.

    Note: In general, you should isolate the network interfaces that are used for heartbeat traffic from your overall network. Heartbeat and synchronization packets contain sensitive configuration information, are latency-sensitive, and can consume considerable network bandwidth.

    Peer IP address

    Enter the IP address of the matching heartbeat network interface of the other member of the HA group.

    For example, if you are configuring the primary unit’s primary heartbeat network interface, enter the IP address of the secondary unit’s primary heartbeat network interface.

    Similarly, for the secondary heartbeat network interface, enter the IP address of the other unit’s secondary heartbeat network interface.

    For information about configuration synchronization and what is not synchronized, see About the heartbeat and synchronization.

    This option appears only for active-passive HA.

    Peer IPv6 address

    Enter the peer IPv6 address in the active-passive HA group. For IPv6 support, see About IPv6 Support.

    Virtual IP action

    Select whether and how to configure the IP addresses and netmasks of the FortiMail unit whose effective HA mode of operation is currently primary.

    For example, a primary unit might be configured to receive email traffic through port1 and receive heartbeat and synchronization traffic through port5 and port6. In that case, you would configure the primary unit to set the IP addresses or add virtual IP addresses for port1 of the secondary unit on failover in order to mimic that of the primary unit.

    • Ignore: Do not change the network interface configuration on failover, and do not monitor. For details on service monitoring for network interfaces, see Configuring the network interfaces.
    • Set: Add the specified virtual IP address and netmask to the network interface on failover. Normally, you will configure your network (MX records, firewall policies, routing and so on) so that clients and mail services use the virtual IP address. Both originating and reply traffic uses the virtual IP address. All replies to sessions with the virtual IP address include the virtual IP address as the source address. Originating traffic, however, will use the network interface’s actual IP address as the source address. Unlike set interface IP/netmask, this option results in the network interface having two IP addresses: the actual and the virtual. For examples, see Example: Active-passive HA group in gateway mode. In v3.0 MR2 and older releases, the behavior is different -- the originating traffic uses the actual IP address, instead of the virtual IP address.
    • Bridge: Include the network interface in the Layer 2 bridge. While the effective HA mode of operation is secondary, the interface is deactivated and cannot process traffic, preventing Layer 2 loops. Then, when the effective HA mode of operation becomes primary, the interface is activated again and can process traffic. This option appears only if the FortiMail unit is operating in transparent mode. This option is not available for Port1 and the ports not in the bridge group. For information on configuring bridging network interfaces, see Editing network interfaces.

    Note: Settings in this section are synchronizable. Configure the primary unit, then synchronize it to the secondary unit. For details, see Using high availability (HA).

    Virtual IP address

    Enter the virtual IPv4 address for this interface.

    Virtual IPv6 address

    Enter the virtual IPv6 address for this interface. For IPv6 support, see About IPv6 Support.

    Configuring service-based failover

    Go to System > High Availability > Configuration to configure remote service monitoring, local network interface monitoring, and local hard drive monitoring.

    Note

    Service monitoring is not available for config-only HA groups.

    HA service monitoring settings are not synchronized and must be configured separately on each primary and secondary unit.

    With remote service monitoring, the secondary unit confirms that it can connect to the primary unit over the network using SMTP service, POP service (POP3), and Web service (HTTP) connections. If you configure the HA pair in server mode, the IMAP service can also be checked.

    With local network interface monitoring and local hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.

    If service monitoring detects a failure, the effective HA operating mode of the primary unit switches to off or failed (depending on the On failure setting) and, if configured, the FortiMail units send HA event alert email, record HA event log messages, and send HA event SNMP traps. A failover then occurs, and the effective HA operating mode of the secondary unit switches to the primary unit. For information on the On failure option, see Configuring the HA mode and group. For information on the effective HA operating mode, see Monitoring the HA status.

    For example, if service monitoring detects that port2 on the primary unit has failed, the primary unit records a log message similar to the following.

    date=2005-11-18 time=18:20:31 device_id=FE-4002905500194 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: local problem detected (port2), shutting down"

    The primary unit also sends an alert email similar to the following:

    Subject: monitord: local problem detected (port2), shutting down [primary-host-name]

    This is the FortiMail HA unit at 10.0.0.1.

    A local problem (port2) has been detected, telling remote to take over and shutting down.

    Remote service monitoring can be effective to configure in addition to, or sometimes as a backup alternative to, the heartbeat. While the heartbeat tests for the general responsiveness of the primary unit, it does not test for the failure of individual services which email users may be using such as POP3 or webmail. The heartbeat also does not monitor for the failure of network interfaces through which non-heartbeat traffic occurs. In this way, configuring remote service monitoring provides more specific failover monitoring. Additionally, if the heartbeat link is briefly disconnected, enabling HA services monitoring can prevent a false failover by acting as a temporary secondary heartbeat. For information on treating service monitoring as a secondary heartbeat, see Remote services as heartbeat.

    To configure service monitoring
    1. Go to System > High Availability > Configuration.
    2. Select primary or secondary as the mode of operation.
    3. Expand the service monitor area, if required.
    4. Select a row in the table and click Edit to configure it.
    5. For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following:
    6. GUI item

      Description

      Enable

      Select to enable connection responsiveness tests for SMTP.

      Name

      Displays the service name.

      Remote IP

      Enter the peer IP address.

      Port

      Enter the port number of the peer SMTP service.

      Timeout

      Enter the timeout period for one connection test.

      Interval

      Enter the frequency of the tests.

      Retries

      Enter the number of consecutively failed tests that are allowed before the primary unit is deemed unresponsive and a failover occurs.

    7. For interface monitoring and local hard drive monitoring, configure the following:
    8. GUI item

      Description

      Enable

      Enable local hard drive monitoring to check if the local hard drive is still accessible, or if the mail data disk is almost full. If the hard disk is not responsive, or if the mail data disk is 95 percent full, a failover will occur.

      Interface monitoring is enabled when you configure interface monitoring. See Configuring interface monitoring.

      Network interface monitoring tests all active network interfaces whose:

      • Virtual IP action setting is not Ignore
      • Configuring interface monitoring setting is enabled

      For details, see Configuring interface monitoring and Virtual IP action.

      Interval

      Enter the frequency of the test.

      Retries

      Specify the number of consecutively failed tests that are allowed before the local interface or hard drive is deemed unresponsive and a failover occurs.

    See also

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring the HA mode and group

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Example: Failover scenarios

    This section describes basic FortiMail active-passive HA failover scenarios. For each scenario, refer to the HA group shown in the following figure. To simplify the descriptions of these scenarios, the following abbreviations are used:

    • P1 is the configured primary unit.
    • S2 is the configured secondary unit.
    Example active-passive HA group

    This section contains the following HA failover scenarios:

    This topic includes:

    Failover scenario 1: Temporary failure of the primary unit

    In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.

    When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.

    Here is what happens during this process:

    1. The FortiMail HA group is operating normally.
    2. The power is accidentally disconnected from P1.
    3. S2’s primary heartbeat test detects that P1 has failed.
    4. How soon this happens depends on the HA daemon configuration of S2.

    5. The effective HA operating mode of S2 changes to primary.
    6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    7. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY heartbeat disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. 2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    Recovering from temporary failure of the primary unit

    After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure settings under System > High Availability > Configuration.

    HA On Failure settings

    • switch off

    P1 will not process email or join the HA group until you manually select the effective HA operating mode (see Using high availability (HA) and Using high availability (HA)).

    • wait for recovery then restore original role

    On recovery, P1’s effective HA operating mode resumes its configured primary role. This also means that S2 needs to give back the primary role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.

    In the case, the S2 will send out another alert email similar to the following:

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘SECONDARY asks us to switch roles (recovery after a restart)

    The state changed from ‘PRIMARY’ to ‘SECONDARY’

    After recovery, P1 also sends out an alert email similar to the following:

    This is the HA machine at 172.16.5.10.

    The following critical event was detected

    The system was shutdown!

    • wait for recovery then restore secondary role

    On recovery, P1’s effective HA operating mode becomes secondary, and S2 continues to assume the primary role. P1 then synchronizes the content of its MTA queue directories with the current primary unit, S2. S2 can then deliver email that existed in P1’s MTA queue directory at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see Using high availability (HA).

    Failover scenario 2: System reboot or reload of the primary unit

    If you need to reboot or reload (not shut down) P1 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI:

    • P1 will send a holdoff command to S2 so that S2 will not take over the primary role during P1’s reboot.
    • P1 will also send out an alert email similar to the following:

    This is the HA machine at 172.16.5.10.

    The following critical event was detected

    The system is rebooting (or reloading)!

    • S2 will hold off checking the services and heartbeat with P1. Note that S2 will only hold off for about 15 minutes. In case P1 never boots up, S2 will take over the primary role.
    • S2 will send out an alert email, indicating that S2 received the holdoff command from P1.

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘peer rebooting (or reloading)’

    The state changed from ‘SECONDARY’ to ‘HOLD_OFF’

    After P1 is up again:

    • P1 will send another command to S2 and ask S2 to change its state from holdoff to secondary and resume monitoring P1’s services and heartbeat.
    • S2 will send out an alert email, indicating that S2 received instruction commands from P1.

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘peer command appeared’

    The state changed from ‘HOLD_OFF’ to ‘SECONDARY’

    • S2 logs the event in the HA logs.

    Failover scenario 3: System reboot or reload of the secondary unit

    If you need to reboot or reload (not shut down) S2 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI, the behavior of P1 and S2 is as follows:

    • P1 will send out an alert email similar to the following, informing the administrator of the heartbeat loss with S2.

    This is the HA machine at 172.16.5.10.

    The following event has occurred

    ‘ha: SECONDARY heartbeat disappeared’

    • S2 will send out an alert email similar to the following:

    This is the HA machine at 172.16.5.11.

    The following critical event was detected

    The system is rebooting (or reloading)!

    • P1 will also log this event in the HA logs.
    Caution

    For FortiMail v4.0 and older releases:

    • P1 will not send out the alert email.
    • P1 will log the event in the HA logs.

    Failover scenario 4: System shutdown of the secondary unit

    If you shut down S2:

    • No alert email is sent out from either P1 or S2.
    • P1 will log this event in the HA logs.

    Failover scenario 5: Primary heartbeat link fails

    If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the secondary unit (S2) changes to operating as a primary unit, and both FortiMail units are acting as primary units.

    Two primary units connected to the same network may cause address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.

    Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the web-based manager of S2 to restore it as the secondary unit.

    1. The FortiMail HA group is operating normally.
    2. The heartbeat link Ethernet cable is accidently disconnected.
    3. S2’s HA heartbeat test detects that the primary unit has failed.
    4. How soon this happens depends on the HA daemon configuration of S2.

    5. The effective HA operating mode of S2 changes to primary.
    6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    7. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY heartbeat disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. 2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    Recovering from a heartbeat link failure

    Because the hardware failure is not permanent (that is, the failure of the heartbeat link was caused by a disconnected cable, not a failed port on one of the FortiMail units), you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.

    To return to normal operation after the heartbeat link fails
    1. Reconnect the primary heartbeat interface by reconnecting the heartbeat link Ethernet cable.
    2. Even though the effective HA operating mode of S2 is primary, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1 is also operating as a primary unit. So S2 sends a heartbeat signal to notify P1 to stop operating as a primary unit. The effective HA operating mode of P1 changes to off.

    3. P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.
    4. This is the HA machine at 172.16.5.10

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'PRIMARY' to 'OFF'

    5. P1 records the following event log messages (among others) indicating that P1 is switching to off mode.
    6. 2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering off mode"

      The configured HA mode of operation of P1 is primary and the effective HA operating mode of P1 is off.

      The configured HA mode of operation of S2 is secondary and the effective HA operating mode of S2 is primary.

      P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.

    7. Connect to the web-based manager of P1, go to System > High Availability > Status.
    8. Check for synchronization messages.
    9. Do not proceed to the next step until P1 has synchronized with S2.

    10. Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    11. The HA group should return to normal operation. P1 records the following event log message (among others) indicating that S2 asked P1 to return to operating as the primary unit.

      2005-11-30 18:10:00 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: being asked to assume original role"

    12. P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.

    Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)

    Depending on your network configuration, the network connection between the primary and secondary units can fail for a number of reasons. In the network configuration shown in Example active-passive HA group, the connection between port1 of primary unit (P1) and port1 of the secondary unit (S2) can fail if a network cable is disconnected or if the switch between P1 and S2 fails.

    A more complex network configuration could include a number of network devices between the primary and secondary unit’s non-heartbeat network interfaces. In any configuration, remote service monitoring can only detect a communication failure. Remote service monitoring cannot determine where the failure occurred or the reason for the failure.

    In this scenario, remote service monitoring has been configured to make sure that S2 can connect to P1. The On failure setting located in the HA main configuration section is wait for recovery then restore secondary role. For information on the On failure setting, see On failure. For information about remote service monitoring, see Configuring service-based failover.

    The failure occurs when power to the switch that connects the P1 and S2 port1 interfaces is disconnected. Remote service monitoring detects the failure of the network connection between the primary and secondary units. Because of the On failure setting, P1 changes its effective HA operating mode to failed.

    When the failure is corrected, P1 detects the correction because while operating in failed mode P1 has been attempting to connect to S2 using the port1 interface. When P1 can connect to S2, the effective HA operating mode of P1 changes to secondary and the mail data on P1 will be synchronized to S2. S2 can now deliver this mail. The HA group continues to operate in this manner until an administrator resets the effective HA modes of operation of the FortiMail units.

    1. The FortiMail HA group is operating normally.
    2. The power cable for the switch between P1 and S2 is accidentally disconnected.
    3. S2’s remote service monitoring cannot connect to the primary unit.
    4. How soon this happens depends on the remote service monitoring configuration of S2.

    5. Through the HA heartbeat link, S2 signals P1 to stop operating as the primary unit.
    6. The effective HA operating mode of P1 changes to failed.
    7. The effective HA operating mode of S2 changes to primary.
    8. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY remote service disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    10. S2 logs the event (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    11. 2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    12. P1 sends an alert email similar to the following, indicating that P1 has stopped operating in HA mode.
    13. This is the HA machine at 172.16.5.10.

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'PRIMARY' to 'FAILED'

    14. P1 records the following log messages (among others) indicating that P1 is switching to Failed mode.
    15. 2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering failed mode"

    Recovering from a network connection failure

    Because the network connection failure was not caused by failure of either FortiMail unit, you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.

    To return to normal operation after the heartbeat link fails
    1. Reconnect power to the switch.
    2. Because the effective HA operating mode of P1 is failed, P1 is using remote service monitoring to attempt to connect to S2 through the switch.

    3. When the switch resumes operating, P1 successfully connects to S2.
    4. P1 has determined the S2 can connect to the network and process email.

    5. The effective HA operating mode of P1 switches to secondary.
    6. P1 logs the event.
    7. 2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: starting pre-amble"

      2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: ** response from peer, setting to SECONDARY mode"

    8. P1 sends an alert email similar to the following, indicating that P1 is switching its effective HA operating mode to secondary.
    9. This is the HA machine at 172.16.5.10.

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'FAILED' to 'SECONDARY'

    10. P1 synchronizes the content of its MTA queue directories to S2. S2 can now deliver all email in these directories.
    11. The HA group can continue to operate with S2 as the primary unit and P1 as the secondary unit. However, you can use the following steps to restore each unit to its configured HA mode of operation.

    12. Connect to the web-based manager of P1 and go to System > High Availability > Status.
    13. Check for synchronization messages.
    14. Do not proceed to the next step until P1 has synchronized with S2.

    15. Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    16. Connect to the web-based manager of P1, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    17. P1 should return to operating as the primary unit and S2 should return to operating as the secondary unit.

    18. P1 and S2 synchronize their MTA queue directories again. P1 can now deliver all email in these directories.

    Example: Active-passive HA group in gateway mode

    In this example, two FortiMail-400 units are configured to operate in gateway mode as an active-passive HA group.

    The procedures in this example describe HA configuration necessary to achieve this scenario. Before beginning, verify that both of the FortiMail units are already:

    Virtual IP address for HA failover

    The active-passive HA group is located on a private network with email users and the protected email server. All are behind a FortiGate unit which separates the private network from the Internet. The DNS server, remote email users, and external SMTP servers are located on the Internet.

    For both FortiMail units:

    port1

    • connected to a switch which is connected only to the computer that the FortiMail administrator uses to manage the HA group
    • administrative access occurs through this port

    port3

    • connected to a switch which is connected to the private network and, indirectly, the Internet
    • email connections occur through this port

    port6

    • connected directly to each other using a crossover cable
    • heartbeat and synchronization occurs through this port

    The secondary unit will become the new primary unit when a failover occurs. In order for it to receive the connections formerly destined for the failed primary unit, the new primary unit must adopt the failed primary unit’s IP address. You will configure an HA virtual IP address on port3 for this purpose.

    While the configured primary unit is functional, the HA virtual IP address is associated with its port3 network interface, which receives email connections. After a failover, the HA virtual IP address becomes associated with the new primary unit’s port3. As a result, after a failover, the new primary unit (originally the secondary unit) will then receive and process the email connections.

    This example contains the following topics:

    About standalone versus HA deployment

    If you plan to convert a standalone FortiMail unit to a member of an HA group, first understand the changes you need to make for HA deployment shown in Virtual IP address for HA failover in the context of its similarities and differences with a standalone deployment.

    Examine the network interface configuration of a standalone FortiMail-400 unit in the following table.

    Example standalone network interface configuration

    Network interface

    IP address

    Description

    port1

    192.168.1.5

    Administrative connections to the FortiMail unit.

    port2, port4

    Default

    Not connected.

    port3

    172.16.1.2

    Email connections to the FortiMail unit; the target of your email DNS A records (No administrative access).

    port5

    Default

    Not connected.

    port6

    Default

    Not connected.

    Similarly, for the HA group, DNS A records should target the IP address of the port3 interface of the primary FortiMail-400 unit. Additionally, administrators should administer each FortiMail unit in the HA group by connecting to the IP address of each FortiMail unit’s port1.

    If a failover occurs, the network must be able to direct traffic to port3 of the secondary unit without reconfiguring the DNS A record target. The secondary unit must cleanly and automatically substitute for the primary unit, as if they were a single, standalone unit.

    Unlike the configuration of the standalone unit, for the HA group to accomplish that substitution, all email connections must use an IP address that transfers between the primary unit and the secondary unit according to which is currently the primary unit. This transferable IP address can be accomplished by configuring the HA group to either:

    • set the IP address of the current primary unit’s network interface
    • add a virtual IP address to the current primary unit’s network interface

    In this example, the HA group uses the method of adding a virtual IP address. Email connections will not use the actual IP address of port3. Instead, all email connections will use only the virtual IP address 172.16.1.2, which is used by port3 of whichever FortiMail unit’s effective HA operating mode is currently primary. During normal HA group operation, this IP address resides on the primary unit. Conversely, after a failover occurs, this IP address resides on the former secondary unit (now the current primary unit).

    Also unlike the configuration of the standalone unit, both port5 and port6 are configured for each member of the HA group. The primary unit’s port5 is directly connected using a crossover cable to the secondary unit’s port5; the primary unit’s port6 is directly connected to the secondary unit’s port6. These links are used solely for heartbeat and synchronization traffic between members of the HA group.

    For comparison with the standalone unit, examine the network configuration of the primary unit in the following table.

    Example primary unit HA network interface configuration

    Interface

    IP/Netmask

    Virtual IP address

    Description

    Setting

    IP address

    port1

    192.168.1.5

    Ignore

    Administrative connections to this FortiMail unit.

    Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can still connect to this FortiMail unit after failover, which may be useful for diagnostic purposes.

    port2, port4

    Default

    Ignore

    Not connected.

    port3

    172.16.1.5

    Set

    172.16.1.2

    Email connections to the FortiMail unit; the target of your email DNS MX and A records. Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. No administrative access.

    port5

    10.0.1.2

    Ignore

    Secondary heartbeat and synchronization interface.

    port6

    10.0.0.2

    Ignore

    Primary heartbeat and synchronization interface.

    Because the Virtual IP action settings are synchronized between the primary and secondary units, you do not need to configure them separately on the secondary unit. However, you must configure the secondary unit with other settings listed in the following table.

    Example secondary unit HA network interface configuration

    Interface

    IP/Netmask

    Virtual IP Address

    Description

    Setting

    IP address

    port1

    192.168.1.6

    (synchronized from primary unit)

    (synchronized from primary unit)

    Administrative connections to this FortiMail unit.

    Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can connect to this FortiMail unit even when it is currently the secondary unit, which may be useful for HA configuration and log viewing.

    port2, port4

    Default

    (synchronized from primary unit)

    (synchronized from primary unit)

    Not connected.

    port3

    172.16.1.6

    (synchronized from primary unit)

    (synchronized from primary unit)

    Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. As a result, no connections should be destined for this network interface until a failover occurs, causing the secondary unit to become the new primary unit. No administrative access.

    port5

    10.0.1.4

    (synchronized from primary unit)

    (synchronized from primary unit)

    Secondary heartbeat and synchronization interface.

    port6

    10.0.0.4

    (synchronized from primary unit)

    (synchronized from primary unit)

    Primary heartbeat and synchronization interface.

    Configuring the DNS and firewall settings

    In the example shown in Virtual IP address for HA failover, SMTP clients will connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, this connection occurs through the public network virtual IP on the FortiGate unit, whose policies allow the connections and route them to the virtual IP on the current primary unit.

    Because the FortiMail HA group is installed behind a firewall performing NAT, the DNS server hosting records for the domain example.com must be configured to reflect the public IP address of the FortiGate unit, rather than the private network IP address of the HA group.

    The DNS server has been configured with:

    • an MX record to indicate that the FortiMail unit is the email gateway for example.com
    • an A record to resolve fortimail.example.com into the FortiGate unit’s public IP address
    • a reverse DNS record to enable external email servers to resolve the public IP address of the FortiGate unit into the domain name of the FortiMail unit

    Configuring the primary unit for HA operation

    The following procedure describes how to prepare a FortiMail unit for HA operation as the primary unit according to Virtual IP address for HA failover.

    In a typical standalone gateway mode configuration, you might set the IP address of the FortiMail-400 unit’s port3 network interface to 172.16.1.2. The FortiGate unit would be configured to NAT email connections to and from that IP address.

    To simulate the same configuration with the active-passive HA group, you will set the actual IP addresses of the port3 interfaces of the primary and backup units to different IP addresses. Then, in the HA options, you will add a virtual IP address of 172.16.1.2 to port3.

    Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode.

    To configure the primary unit for HA operation
    1. Connect to the web-based manager of the primary unit at https://192.168.1.5/admin.
    2. Go to System > Network > Interface.
    3. Configure port 6 to 10.0.0.2/255.255.255.0 and port 5 to 10.0.1.2/255.255.255.0.
    4. Go to System > High Availability > Configuration.
    5. Configure the following:
    6. HA Configuration section

      Mode of operation

      primary

      On failure

      wait for recovery then assume secondary role

      Shared password

      change_me

      Backup options section

      Backup mail data directories

      enabled

      Backup MTA queue directories

      disabled

      Advanced options section

      See Configuring the advanced options.

      HA base port

      2000

      Heartbeat lost threshold

      15 seconds

      Remote services as heartbeat

      disabled

      Interface section

      See Configuring interface monitoring.

      Interface

      port6

      Enable port monitor

      Enabled

      Heartbeat status

      Primary

      Peer IP address

      10.0.0.4

      Interface

      port5

      Enable port monitor

      Enabled

      Heartbeat status

      Secondary

      Peer IP address

      10.0.1.4

      Virtual IP Address

      port1

      Ignore

      port2

      Ignore

      port3

      Set

      172.16.1.2/255.255.255.0

      port4

      Ignore

      port5

      Ignore

      port6

      Ignore

    7. Click Apply.
    8. The FortiMail unit switches to active-passive HA mode, and, after determining that there is no other primary unit, sets its effective HA operating mode to primary. The virtual IP 172.16.1.2 is added to port3; if not already complete, configure DNS records and firewalls to route email traffic to this virtual IP address, not the actual IP address of the port3 network interface.

    9. To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be primary.
    10. If the effective HA operating mode is not primary, the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.

    Configuring the secondary unit for HA operation

    The following procedure describes how to prepare a FortiMail unit for HA operation as the secondary unit according to Virtual IP address for HA failover.

    Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode. Also verify that you configured the primary unit as described in Configuring the primary unit for HA operation.

    To configure the secondary unit for HA operation
    1. Connect to the web-based manager of the secondary unit at https://192.168.1.6/admin.
    2. Go to System > Network > Interface.
    3. Configure port 6 to 10.0.0.4/255.255.255.0 and port 5 to 10.0.1.4/255.255.255.0.
    4. Go to System > High Availability > Configuration.
    5. Configure the following:
    6. Main Configuration section

      See Configuring the primary HA options

      Mode of operation

      secondary

      On failure

      wait for recovery then restore secondary role

      Shared password

      change_me

      Backup options section

      Backup mail data directories

      enabled

      Backup MTA queue directories

      disabled

      Advanced options section

      See Configuring the advanced options.

      HA base port

      2000

      Heartbeat lost threshold

      15 seconds

      Remote services as heartbeat

      disabled

      Interface section

      See Configuring interface monitoring.

      Interface

      port6

      Heartbeat status

      primary

      Peer IP address

      10.0.0.2

      Interface

      port5

      Heartbeat status

      secondary

      Peer IP address

      10.0.1.2

      Virtual IP Address

      (Configuration of the ports will be synchronized with the primary unit, and are therefore not required to be configured on the secondary unit.)

      port1

      Ignore

      port2

      Ignore

      port3

      Set

      172.16.1.2/255.255.255.0

      port4

      Ignore

      port5

      Ignore

      port6

      Ignore

    7. Click Apply.
    8. The FortiMail unit switches to active-passive HA mode, and, after determining that the primary unit is available, sets its effective HA operating mode to secondary.

    9. Go to System > High Availability > Status.
    10. Select click HERE to start a configuration/data sync.
    11. The secondary unit synchronizes its configuration with the primary unit, including Virtual IP action settings that configure the HA virtual IP that the secondary unit will adopt on failover.

    12. To confirm that the FortiMail unit is acting as the secondary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be secondary.
    13. If the effective HA operating mode is not secondary, the FortiMail unit is not acting as the secondary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.

      Note

      If the heartbeat interfaces are not connected, the secondary unit cannot connect to the primary unit, and so the secondary unit will operate as though the primary unit has failed and will switch its effective HA operating mode to primary.

      When both primary unit and the secondary unit are operating in their configured mode, configuration of the active-passive HA group is complete. For information on managing both members of the HA group, see Administering an HA group.

    Administering an HA group

    In most cases, you will an HA group by connecting to the primary unit as if it were a standalone unit.

    Management tasks performed on each HA group member

    Connect to...

    For...

    Primary unit

    (192.168.1.5)

    • synchronized configuration items, such as antispam settings
    • primary unit HA management tasks, such as viewing its effective HA operating mode and configuring its HA mode and Shared password
    • viewing the log messages of the primary unit

    Secondary unit

    (192.168.1.6)

    • secondary unit HA management tasks, such as viewing its effective HA operating mode and configuring its HA mode and Shared password
    • viewing the log messages of the secondary unit

    If the initial configuration synchronization fails, such as if it is disrupted or the network cable is loose, you should manually trigger synchronization after changing the configuration of the primary unit. For information on manually triggering configuration synchronization, see Using high availability (HA).

    Note

    Some parts of the configuration are not synchronized, and must be configured separately on each member of the HA group. For details, see Configuration settings that are not synchronized.

    Using high availability (HA)

    Using high availability (HA)

    Go to System > High Availability to configure the FortiMail unit to act as a member of a high availability (HA) cluster in order to increase processing capacity or availability.

    For the general procedure of how to enable and configure HA, see How to use HA.

    This section contains the following topics:

    About high availability

    FortiMail units can operate in one of two HA modes, active-passive or config-only.

    Comparison of HA modes

    Active-passive HA

    Config-only HA

    2 FortiMail units in the HA group

    2-25 FortiMail units in the HA group

    Typically deployed behind a switch

    Typically deployed behind a load balancer

    Both configuration* and data synchronized

    Only configuration* synchronized

    Only primary unit processes email

    All units process email

    No data loss when hardware fails

    Data loss when hardware fails

    Failover protection, but no increased processing capacity

    Increased processing capacity, but no failover protection

    * For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.

    Active-passive HA group operating in gateway mode

    Config-only HA group operating in gateway mode

    Note

    If the config-only HA group is installed behind a load balancer, the load balancer stops sending email to failed FortiMail units. All sessions being processed by the failed FortiMail unit must be restarted and will be re-directed by the load balancer to other FortiMail units in the config-only HA group.

    You can mix different FortiMail models in the same HA group. However, all units in the HA group must have the same firmware version.

    Note

    When mixing FortiMail models, the HA group is limited by the capacity and configuration limits of the least powerful model.

    Communications between HA cluster members occur through the heartbeat and synchronization connection. For details, see About the heartbeat and synchronization.

    To configure FortiMail units operating in HA mode, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units, so that changes made to the primary unit are propagated to the secondary units. The web-based manager of the backup unit may display “SECONDARY MODE” as a reminder that most configuration changes cannot be made through the backup unit, but instead must be made through the primary unit. For details, see “Banner” on page 35.

    Exceptions to this rule include connecting to a secondary unit in order to view log messages recorded about the secondary unit itself on its own hard disk, and connecting to a secondary unit to configure settings that are not synchronized. For details, see Configuration settings that are not synchronized.

    Note

    To use FortiGuard Antivirus or FortiGuard Antispam with HA, license all FortiMail units in the cluster. If you license only the primary unit in an active-passive HA group, after a failover, the secondary unit cannot connect to the FortiGuard Antispam service. For FortiMail units in a config-only HA group, only the licensed unit can use the subscription services.

    For instructions of how to enable and configure HA, see How to use HA.

    See also

    How to use HA

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Example: Failover scenarios

    Example: Active-passive HA group in gateway mode

    About the heartbeat and synchronization

    Heartbeat and synchronization traffic consists of TCP packets transmitted between the FortiMail units in the HA group through the primary and secondary heartbeat interfaces.

    Note

    Service monitoring traffic can also, for short periods, be used as a heartbeat. For details, see Remote services as heartbeat.

    Heartbeat and synchronization traffic has three primary functions:

    • to monitor the responsiveness of the HA group members
    • to synchronize configuration changes from the primary unit to the secondary units
    • For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.

    • to synchronize mail data from the primary unit to the secondary unit (active-passive only)
    • Mail data consists of the FortiMail system mail directory, user home directories, and mail queue.

    Note

    FortiGuard Antispam packages and FortiGuard Antivirus engines and definitions are not synchronized between primary and secondary units.

    When the primary unit’s configuration changes, it immediately synchronizes the change to the secondary unit (or, in a config-only HA group, to the peer units) through the primary heartbeat interface. If this fails, or if you have inadvertently de-synchronized the secondary unit’s configuration, you can manually initiate synchronization. For details, see Using high availability (HA). You can also use the CLI command diagnose system ha sync on either the primary unit or the secondary unit to manually synchronize the configuration. For details, see the FortiMail CLI Reference.

    During normal operation, the secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal interrupts the HA group, and, if it is active-passive in style, generally triggers a failover. For details, see Failover scenario 1: Temporary failure of the primary unit.

    Exceptions include system restarts and the execute reload CLI command. In case of a system reboot or reload of the primary unit, the primary unit signals the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.

    Periodically, the secondary unit checks with the primary unit to see if there are any configuration changes on the primary unit. If there are configuration changes, the secondary unit will pull the configuration changes from the primary unit, generate a new configuration, and reload the new configuration. In this case, both the primary and secondary units send alert email. For details, see Failover scenario 3: System reboot or reload of the secondary unit.

    Behavior varies by your HA mode when the heartbeat fails:

    • Active-passive HA

    A new primary unit is elected: the secondary unit becomes the new primary unit and assumes the duty of processing of email. During the failover, no mail data or configuration changes are lost, but some in-progress email deliveries may be interrupted. These interrupted deliveries may need to be restarted, but most email clients and servers can gracefully handle this. Additional failover behaviors may be configured. For details, see On failure.

    Note

    Maintain the heartbeat connection. If the heartbeat is accidentally interrupted for an active-passive HA group, such as when a network cable is temporarily disconnected, the secondary unit will assume that the primary unit has failed, and become the new primary unit. If no failure has actually occurred, both FortiMail units will be operating as primary units simultaneously. For details on correcting this, see Using high availability (HA).

    • Config-only HA

    Each secondary unit continues to operate normally. However, with no primary unit, changes to the configuration are no longer synchronized. You must manually configure one of the secondary units to operate as the primary unit, synchronizing its changes to the remaining secondary units.

    For failover examples and steps required to restore normal operation of the HA group in each case, see Example: Failover scenarios.

    HA default ports and protocols

    The following default ports are used for HA heartbeat and synchronization. In case you have a firewall in between the primary and secondary units, make sure the following ports are allowed in your firewall policies:

    UDP/20000

    Base port for HA heartbeat signal

    UDP/20001

    Synchronization control

    TCP/20002

    File synchronization

    TCP/20003

    Data synchronization

    TCP/20004

    Checksum synchronization

    TCP/25

    HA service monitoring - remote SMTP

    TCP/80

    HA service monitoring - remote HTTP

    TCP/110

    HA service monitoring - remote POP3

    TCP/143

    HA service monitoring - remote IMAP

    See also

    Configuration settings that are not synchronized

    Synchronization of MTA queue directories after a failover

    About high availability

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring the HA mode and group

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Configuration settings that are not synchronized

    All configuration settings on the primary unit are synchronized to the secondary unit, except the following:

    HA settings not synchronized

    Operation mode

    You must set the operation mode (gateway, transparent, or server) of each HA group member before configuring HA.

    Host name

    The host name distinguishes members of the cluster. For details, see Host name.

    Static route

    Static routes are not synchronized because the HA units may be in different networks (see Configuring static routes ).

    Interface configuration

    (gateway and server mode only)

    Each FortiMail unit in the HA group must be configured with different network interface settings for connectivity purposes. For details, see Configuring the network interfaces.

    Exceptions include some active-passive HA settings which affect the interface configuration for failover purposes. These settings are synchronized. For details, see Virtual IP Address.

    Management IP address

    (transparent mode only)

    Each FortiMail unit in the HA group should be configured with different management IP addresses for connectivity purposes. For details, see About the management IP.

    SNMP system information

    Each FortiMail unit in the HA group will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring the network interfaces.

    RAID configuration

    RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized.

    Main HA configuration

    The main HA configuration, which includes the HA mode of operation (such as primary or secondary), is not synchronized because this configuration must be different on the primary and secondary units. For details, see Configuring the HA mode and group.

    HA Daemon configuration

    The following HA daemon settings are not synchronized:

    • Shared password
    • Backup mail data directories
    • Backup MTA queue directories

    You must add the shared HA password to each unit in the HA group. All units in the HA group must use the same shared password to identify the group.

    Since the mail data and MTA queue backup settings are not synchronized, to use this feature, you must enable it on both the primary and secondary units.

    Synchronized HA daemon options that are active-passive HA settings affect how often the secondary unit tests the primary unit and how the secondary unit synchronizes configuration and mail data. Because HA daemon settings on the secondary unit control how the HA daemon operates, in a functioning HA group you would change the HA daemon configuration on the secondary unit to change how the HA daemon operates. The HA daemon settings on the primary unit do not affect the operation of the HA daemon.

    HA service monitoring configuration

    In active-passive HA, the HA service monitoring configuration is not synchronized. The remote service monitoring configuration on the secondary unit controls how the secondary unit checks the operation of the primary unit. The local services configuration on the primary unit controls how the primary unit tests the operation of the primary unit. For details, see Configuring service-based failover.

    Note: You might want to have a different service monitoring configuration on the primary and secondary units. For example, after a failover you may not want service monitoring to operate until you have fixed the problems that caused the failover and have restarted normal operation of the HA group.

    Product name and icon

    The product names and icons under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized.

    Config-only HA

    In config-only HA, the following settings are not synchronized:

    • the local domain name
    • default certificate
    • iSCSI initiator name
    • iSCSI ID for remote storage
    • SNMP settings
    • IP pools (see Configuring IP pools)
    • the quarantine report host name (see Web release host name/IP)
    • IBE settings of base URL, Help content URL, and About content URL
    • Centralized quarantine client IP address
    • Centralized IBE client IP address
    • Starting from 5.4.0 release, all system, domain, and user level block/safe lists are synchronized. Before 5.4.0 release, user-level block/safe lists are not synchronized. But system and domain-level block/safe lists are synchronized. Before v5.0.2 release, domain-level block/safe lists are not automatically synchronized either.
    • Note

      Note that user data is synchronized at predefined time intervals, not in real time.

    See also

    About the heartbeat and synchronization

    Synchronization of MTA queue directories after a failover

    During normal operation, email messages are in one of three states:

    • being received or sent by the primary unit
    • waiting to be delivered in the mail queue
    • stored on the primary unit’s mail data directories (email quarantines, email archives, and email inboxes of server mode)

    When normal operation of an active-passive HA group is interrupted and a failover occurs, sending and receiving is interrupted. The delivery attempt fails, and the sender usually retries to send the email message. However, stored messages remain in the primary unit’s mail data directories.

    You usually should configure HA to synchronize the stored mail data to prevent loss of email messages, but you usually will not want to regularly synchronize the mail queue. This is because, to prevent loss of email messages in the failed primary unit, FortiMail units in active-passive HA use the following failover mechanism:

    Note

    If the failed primary unit effective HA operating mode is failed, a sequence similar to the following occurs automatically when the problem that caused the failure is corrected.

    1. The secondary unit detects the failure of the primary unit, and becomes the new primary unit.
    2. The former primary unit restarts, detects the new primary unit, and becomes a secondary unit.
    3. Note

      You may have to manually restart the failed primary unit.

    4. The former primary unit pushes its mail queue to the new primary unit.
    5. This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.

    6. The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.

    As a result, as long as the failed primary unit can restart, no email is lost from the mail queue.

    Even if you choose to synchronize the mail queue, because its contents change very rapidly and synchronization is periodic, there is a chance that some email in these directories will not be synchronized at the exact moment a failover occurs.

    See also

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to the secondary units, all FortiMail units in the HA group record their own separate log messages and send separate alert email messages. Log data is not synchronized. For details on configuring logging and viewing log messages, see Logs, reports and alerts.

    Note

    To distinguish alert email from each member of the HA cluster, configure a different host name for each member. For details, see Host name.

    To use SNMP, configure each cluster member separately and enable HA events for the community. If you enable SNMP for all units, they can all send SNMP traps. Additionally, you can use an SNMP server to monitor the primary and secondary units for HA settings, such as the HA configured and effective mode of operation. For details on SNMP, see Configuring the network interfaces.

    Note

    To aid in quick discovery and diagnosis of network problems, consider configuring SNMP, Syslog, and/or alert email to monitor the HA cluster for failover messages.

    See also

    Getting HA information using SNMP

    About the heartbeat and synchronization

    Getting HA information using SNMP

    You can use an SNMP manager to get information about how FortiMail HA is operating. The FortiMail MIB (fortimail.mib) and the FortiMail trap MIB (fortimail.trap.mib) include the HA fields listed below.

    FortiMail MIB fields

    MIB Field

    Description

    fortimail.mib

    fmlHAEventId

    Provides the ID of the most recent HA event.

    fmlHAUnitIp

    Provides the IP address of the port1 interface of the FortiMail unit on which an HA event occurred.

    fmlHAEventReason

    Provides the description of the reason for the HA event.

    fmlHAMode

    Provides the HA configured mode of operation that you configured the FortiMail unit to operate in (either as primary or secondary).

    fmlHAEffectiveMode

    Provides the effective HA mode of operation (applies to active-passive HA only), either as the primary unit or as the secondary unit. The effective HA mode of operation matches the configured mode of operation unless a failure has occurred.

    fortimail.trap.mib

    fmlTrapHAEvent

    Provides the FortiMail HA trap that is sent when an HA event occurs. This trap includes the contents of the fmlSysSerial, fmlHAEventId, fmlHAUnitIp, and fmlHAEventReason MIB fields.

    How to use HA

    In general, to enable and configure HA, you should perform the following:

    1. If the HA cluster will use FortiGuard Antivirus and/or FortiGuard Antispam services, license all FortiMail units in the HA group for the FortiGuard Antispam and FortiGuard Antivirus services, and register them with the Fortinet Technical Support web site, https://support.fortinet.com/.
    2. Physically connect the FortiMail units that will be members of the HA cluster.
    3. You must connect at least one of their network interfaces for heartbeat and synchronization traffic between members of the cluster. For reliability reasons, Fortinet recommends that you connect both a primary and a secondary heartbeat interface, and that they be connected directly or through a dedicated switch that is not connected to your overall network.

    4. For config-only clusters, configure each member of the cluster to store mail data on a NAS server that supports NFS connections (active-passive groups may also use a NAS server, but do not require it). For details, see Selecting the mail data storage location.
    5. On each member of the cluster:
    • Enable the HA mode that you want to use (either active-passive or config-only) and select whether the individual member will act as a primary unit or secondary unit within the cluster. For information about the differences between the HA modes, see About high availability.
    • Configure the local IP addresses of the primary and secondary heartbeat and synchronization network interfaces.
    • For active-passive clusters, configure the behavior on failover, and how the network interfaces should be configured for whichever FortiMail unit is currently acting as the primary unit. Additionally, if the FortiMail units store mail data on a NAS, disable mail data synchronization between members.
    • For config-only clusters, if the FortiMail unit is a primary unit, configure the IP addresses of its secondary units; if the FortiMail unit is a secondary unit, configure the IP address of its primary unit.

    For details, see Configuring the HA mode and group.

  • If the HA cluster is active-passive and you want to trigger failover when hardware or a service fails, even if the heartbeat connection is still functioning, configure service monitoring. For details, see Configuring service-based failover.
  • Monitor the status of each cluster member. For details, see Monitoring the HA status. To monitor HA events through log messages and/or alert email, you must first enable logging of HA activity events. For details, see Logs, reports and alerts.
  • See also

    About the heartbeat and synchronization

    Centrally monitoring the HA cluster

    Monitoring the HA status

    The Status tab in the High Availability submenu shows the configured HA mode of operation of a FortiMail unit in an HA group. You can also manually initiate synchronization and reset the HA mode of operation. A reset may be required if a FortiMail unit’s effective HA mode of operation differs from its configured HA mode of operation, such as after a failover when a configured primary unit is currently acting as a secondary unit.

    For FortiMail units operating as secondary units, the Status tab also lets you view the status and schedule of the HA synchronization daemon.

    Appearance of the Status tab varies by:

    • whether the HA group is active-passive or config-only
    • whether the FortiMail unit is configured as a primary unit or secondary unit
    • whether a failover has occurred (active-passive only)

    If HA is disabled, this tab displays:

    HA mode is currently disabled

    Before you can use the Status tab, you must first enable and configure HA. For details, see Configuring the HA mode and group.

    To view the HA mode of operation status, go System > High Availability > Status.

    Viewing HA status

    GUI item

    Description

    Configured Operating Mode

    Displays the HA operating mode that you configured, either:

    • Primary: Configured to be the primary unit of an active-passive group.
    • Secondary: Configured to be the secondary unit of an active-passive group.
    • Config-primary: Configured to be the primary unit of a config-only group.
    • Config-secondary: Configured to be a secondary unit of a config-only group.

    For information on configuring the HA operating mode, see HA mode.

    After a failure, the FortiMail unit may not be acting in its configured HA operating mode. For details, see Using high availability (HA).

    Effective Operating Mode

    Displays the mode that the unit is currently operating in, either:

    • Primary: Acting as primary unit.
    • Secondary: Acting as secondary unit.
    • Off: For primary units, this indicates that service/interface monitoring has detected a failure and has taken the primary unit offline, triggering failover. For secondary units, this indicates that synchronization has failed once; a subsequent failure will trigger failover. For details, see On failure and Using high availability (HA).
    • Failed: Service/network interface monitoring has detected a failure and the diagnostic connection is currently determining whether the problem has been corrected or failover is required. For details, see On failure.

    The configured HA operating mode matches the effective operating mode unless a failure has occurred.

    For example, after a failover, a FortiMail unit configured to operate as a secondary unit could be acting as a primary unit.

    For explanations of combinations of configured and effective HA modes of operation, see Monitoring the HA status.For information on restoring the FortiMail unit to an effective HA operating mode that matches the configured operating mode, see Using high availability (HA).

    This option appears only if the FortiMail unit is a member of an active-passive HA group.

    Detail Status

    This table is viewable, when HA is configured, by all HA units (primary/secondary, and config-primary/config-secondary):

    • IP: IP address of HA cluster members.

    • SN: Serial number of HA cluster member.

    • Secondary: Displays the configuration synchronization status of the secondary/config-secondary unit.

    • Primary: Displays the configuration synchronization status of the primary/config-primary unit.

    • Status: Displays whether or not the HA cluster is synchronized.

    • Last Seen: Displays the last time the primary unit’s HA daemon checked to make sure that the secondary unit is operating correctly.

    • Monitoring occurs through the heartbeat link between the primary and secondary units.

      For details, see HA base port.

    Action

    Displays the actions you can take, depending on the context:

    • Start configuration sync: Click to manually initiate synchronization of the configurations. For information on items that are not synchronized, see Configuration settings that are not synchronized.

    • Switch to secondary/primary mode: Option depends on HA unit's role; click to manually switch the effective HA operating mode of the primary unit so that it becomes a secondary unit, or vice versa.

    • Restart the HA system: Click to restart HA processes after they have been halted due to detection of a failure by service monitoring. For details, see On failure, Configuring service-based failover, and Restarting the HA processes on a stopped primary unit.

      This option appears only if the FortiMail unit is configured to operate as the primary unit, but its effective HA operating mode is off.

    Combinations of configured and effective HA modes of operation

    Configured operating mode

    Effective operating mode

    Description

    Primary

    Primary

    Normal for the primary unit of an active-passive HA group.

    Secondary

    Secondary

    Normal for the secondary unit of an active-passive HA group.

    Primary

    Off

    The primary unit has experienced a failure, or the FortiMail unit is in the process of switching to operating in HA mode.

    HA processes and email processing are stopped.

    Secondary

    Off

    The secondary unit has detected a failure, or the FortiMail unit is in the process of switching to operating in HA mode.

    After the secondary unit starts up and connects with the primary unit to form an HA group, the first configuration synchronization may fail in special circumstances. To prevent both the secondary and primary units from simultaneously acting as primary units, the effective HA mode of operation becomes off.

    If subsequent synchronization fails, the secondary unit’s effective HA mode of operation becomes primary.

    Primary

    Failed

    The remote service monitoring or local network interface monitoring on the primary unit has detected a failure, and will attempt to connect to the other FortiMail unit. If the problem that caused the failure has been corrected, the effective HA mode of operation switches from failed to secondary, or to match the configured HA mode of operation, depending on the On failure setting.

    Additionally, f the HA group is operating in transparent mode, and if the effective HA mode of operation changes to failed, the network interface IP/netmask on the secondary unit displays bridging (waiting for recovery). For details, see Configuring the network interfaces.

    Primary

    Secondary

    The primary unit has experienced a failure but then returned to operation. When the failure occurred, the unit configured to be the secondary unit became the primary unit. When the unit configured to be the primary unit restarted, it detected the new primary unit and so switched to operating as the secondary unit.

    Secondary

    Primary

    The secondary unit has detected that the FortiMail unit configured to be the primary unit failed. When the failure occurred, the unit configured to be the secondary unit became the primary unit.

    Config primary

    N/A

    Normal for the primary unit of a config-only HA group.

    Config secondary

    N/A

    Normal for the secondary unit of a config-only HA group.

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring the HA mode and group

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Restarting the HA processes on a stopped primary unit

    If you configured service monitoring on an active-passive HA group (see Configuring service-based failover) and either the primary unit or the secondary unit detects a service failure on the primary unit, the primary unit changes its effective HA mode of operation to off, stops processing email, and halts all of its HA processes.

    After resolving the problem that caused the failure, you can use the following steps to restart the HA processes on the primary unit.

    In this example, resolving this problem could be as simple as reconnecting the cable to the port2 network interface. Once the problem is resolved, use the following steps to restart the stopped primary unit.

    To restart a stopped primary unit
    1. Log in to the web-based manager of the primary unit.
    2. Go to System > High Availability > Status.
    3. Under Action, click Restart the HA system.
    4. The primary unit restarts and rejoins the HA group.

    If a failover has occurred due to processes being stopped on the primary unit, and the secondary unit is currently acting as the primary unit, you can restore the primary and secondary units to acting in their configured roles. For details, see Using high availability (HA).

    See also

    Monitoring the HA status

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Configuring the HA mode and group

    The Configuration tab in the System > High Availability submenu lets you configure the high availability (HA) options, including:

    • enabling HA
    • selecting whether the HA group is active-passive or config-only in style
    • whether this individual FortiMail unit will act as a primary unit or a secondary unit in the cluster
    • network interfaces that will be used for heartbeat and synchronization
    • service monitor
    Caution

    For config-only HA, if the FortiMail unit is operating in server mode, you must store mail data externally, on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units. For details on configuring NAS, see Storing mail data on a NAS server and Selecting the mail data storage location.

    For an explanation of active-passive and config-only, see About high availability.

    HA settings, with the exception of Virtual IP Address settings, are not synchronized and must be configured separately on each primary and secondary unit.

    You must maintain the physical link between the heartbeat and synchronization network interfaces. These connections enable cluster members to detect the responsiveness of other members, and to synchronize data. If they are interrupted, normal operation will be interrupted and, for active-passive HA groups, a failover will occur. For more information on heartbeat and synchronization, see About the heartbeat and synchronization.

    For an active-passive HA group, or a config-only HA group consisting of only two FortiMail units, directly connect the heartbeat network interfaces using a crossover Ethernet cable. For a config-only HA group consisting of more than two FortiMail units, connect the heartbeat network interfaces through a switch, and do not connect this switch to your overall network.

    To configure HA options
    1. Go to System > High Availability > Configuration.
    2. The appearance of sections and the options in them options vary greatly with your choice in the Mode of operation drop-down-list.

    3. Configure the following sections, as applicable:
  • Click Apply.
  • Configuring the primary HA options

    Go to System > High Availability > Configuration and click the arrow to expand the HA configuration section, if needed. The options presented vary greatly depending on your choice in the Mode of operation drop-down-list.

    HA main options

    GUI item

    Description

    HA mode

    Enables or disables HA, selects active-passive or config-only HA, and selects the initial configured role this FortiMail unit in the HA group.

    • Off: The FortiMail unit is not operating in HA mode.
    • Primary: The FortiMail unit is the primary unit in an active-passive HA group.
    • Secondary: The FortiMail unit is the secondary unit in an active-passive HA group.
    • Config-primary: The FortiMail unit is the primary unit in a config-only HA group.
    • Config-secondary: The FortiMail unit is a secondary unit in a config-only HA group.

    On failure

    Select one of the following behaviors of the primary unit when it detects a failure, such as on a power failure or from service/interface monitoring.

    • switch off: Do not process email or join the HA group until you manually select the effective operating mode (see Using high availability (HA) and Using high availability (HA)).
    • wait for recovery then restore original role: On recovery, the failed primary unit‘s effective HA mode of operation resumes its configured primary role. This also means that the secondary unit needs to give back the primary role to the primary unit. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
    • wait for recovery then restore secondary role: On recovery, the failed primary unit’s effective HA mode of operation becomes secondary, and the secondary unit continue to assume the primary role. The primary unit then synchronizes the content of its MTA queue directories with the current primary unit. The new primary unit can then deliver email that existed in the former primary unit’s MTA queue at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see Using high availability (HA).

    In most cases, you should select the wait for recovery then restore secondary role option.

    This option appears only if HA mode is primary.

    Shared password

    Enter an HA password for the HA group. You must configure the same Shared password value on both the primary and secondary units.

    Enable centralized monitor

    Enable or disable the central statistics service.

    Once enabled, administrators on the primary HA unit can monitor the state and activity of each HA cluster member, including CPU, memory, and disk usage, email throughput, and other statistic summaries.

    This feature can also be enabled in the CLI by enabling central-statistics under config system ha. For more information, see the FortiMail CLI Reference.

    For more information, see Centrally monitoring the HA cluster.

    Configuring the primary configuration IP

    If you are configuring the unit as the secondary unit in a config-only group, go to System > High Availability > Configuration to configure the primary IP address.

    In the Primary IP address field, enter the IP of the primary heartbeat network interface of the primary unit. The secondary unit synchronizes only with this primary unit’s IP address.

    Configuring the advanced options

    Go to System > High Availability > Configuration to configure the advanced options.

    For config-only groups, just the HA base port option appears.

    HA advanced options

    GUI item

    Description

    Synchronize mail data directory

    Synchronize system quarantine, email archives, email users’ mailboxes (server mode only), preferences, and per-recipient quarantines.

    Unless the HA cluster stores its mail data on a NAS server, you should configure the HA cluster to synchronize mail directories.

    If mail data changes frequently, you can manually initiate a data synchronization when significant changes are complete. For details, see Using high availability (HA).

    Synchronize MTA queue directory

    Synchronize the mail queue of the FortiMail unit. For more information on the mail queue, see Managing the mail queue.

    Caution: If the primary unit experiences a hardware failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost.

    Note: Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover.

    HA base port

    Enter the first of four TCP port numbers that will be used for:

    • the heartbeat signal
    • synchronization control
    • data synchronization
    • configuration synchronization

    Note: For active-passive groups, in addition or alternatively to configuring the heartbeat, you can configure service monitoring. For details, see Configuring service-based failover.

    Note: In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Using high availability (HA).

    Heartbeat lost threshold

    Enter the total span of time, in seconds, for which the primary unit can be unresponsive before it triggers a failover and the secondary unit assumes the role of the primary unit.

    The heartbeat will continue to check for availability once per second. To prevent premature failover when the primary unit is simply experiencing very heavy load, configure a total threshold of three (3) seconds or more to allow the secondary unit enough time to confirm unresponsiveness by sending additional heartbeat signals.

    Note: If the failure detection time is too short, the secondary unit may falsely detect a failure when during periods of high load.

    Caution: If the failure detection time is too long the primary unit could fail and a delay in detecting the failure could mean that email is delayed or lost. Decrease the failure detection time if email is delayed or lost because of an HA failover.

    Remote services as heartbeat

    Enable to use remote service monitoring as a secondary HA heartbeat. If enabled and both the primary and secondary heartbeat links fail or become disconnected, if remote service monitoring still detects that the primary unit is available, a failover will not occur.

    Note: The remote service check is only applicable for temporary heartbeat link fails. If the HA process restarts due to system reboot or HA daemon reboot, physical heartbeat connections will be checked first. If the physical connections are not found, the remote service monitoring does not take effect anymore.

    Note: Using remote services as heartbeat provides HA heartbeat only, not synchronization. To avoid synchronization problems, you should not use remote service monitoring as a heartbeat for extended periods. This feature is intended only as a temporary heartbeat solution that operates until you reestablish a normal primary or secondary heartbeat link.

    Configuring the secondary system options

    This section appears only when the mode of operation is set to config-primary under System > High Availability > Configuration.

    HA peer options

    GUI item

    Description

    IP address

    Double-click in order to modify, then enter the IP address of the primary network interface on that secondary unit.

    Create

    Click to add a secondary unit to the list of Peer systems, then double-click its IP address.

    The primary unit synchronizes only with secondary units in the list of Peer systems.

    Delete

    Click the row corresponding to a peer IP address, then click this button to remove that secondary unit from the HA group.

    See also

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring service-based failover

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Storing mail data on a NAS server

    For FortiMail units operating in server mode as a config-only HA group, you must store mail data on a NAS server instead of locally. If mail data is stored locally, email users’ messages and other mail data could be scattered across multiple FortiMail units.

    Even if your FortiMail units are not operating in server mode with config-only HA, however, storing mail data on a NAS server may have a number of benefits for your organization. For example, backing up your NAS server regularly can help prevent loss of mail data. Also, if your FortiMail unit experiences a temporary failure, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.

    For config-only HA groups using a network attached storage (NAS) server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email and to configure their own Bayesian filters.

    For a active-passive HA groups, the primary unit reads and writes all mail data to and from the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit uses the same NAS server for mail data. The new primary unit can access all mail data that the original primary unit stored on the NAS server. So if you are using a NAS server to store mail data, after a failover, the new primary unit continues operating with no loss of mail data.

    Note

    If the FortiMail unit is a member of an active-passive HA group, and the HA group stores mail data on a remote NAS server, disable mail data synchronization to prevent duplicate mail data traffic.

    For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.

    See also

    About the heartbeat and synchronization

    Configuring the HA mode and group

    Configuring interface monitoring

    In active-passive HA mode, Interface monitor checks the local interfaces on the primary unit. If a malfunctioning interface is detected, a failover will be triggered.

    To configure interface monitoring
    1. Go to System > High Availability > Configuration.
    2. Select primary or secondary as the mode of operation.
    3. Expand the Interface area, if required.
    4. Click on the port/interface name to configure the interface. For details, see Configuring the network interfaces.
    5. Note

      The interface IP address must be different from, but on the same subnet as, the IP addresses of the other heartbeat network interfaces of other members in the HA group.

      When configuring other FortiMail units in the HA group, use this value as the:

      • Remote peer IP (for active-passive groups)
      • Primary configuration (for secondary units in config-only groups)

      Peer systems (for the primary unit on config-only groups)

    6. Select a row in the table and click Edit to configure the following HA settings on the interface.

    GUI item

    Description

    Port

    Displays the interface name you’re configuring.

    Enable port monitor

    Enable to monitor a network interface for failure. If the port fails, the primary unit will trigger a failover.

    Heartbeat status

    Specify if this interface will be used for HA heartbeat and synchronization.

    • Disable

    Do not use this interface for HA heartbeat and synchronization.

    • Primary

    Select the primary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

    This network interface must be connected directly or through a switch to the Primary heartbeat network interface of other members in the HA group.

    • Secondary

    Select the secondary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

    The secondary heartbeat interface is the backup heartbeat link between the units in the HA group. If the primary heartbeat link is functioning, the secondary heartbeat link is used for the HA heartbeat. If the primary heartbeat link fails, the secondary link is used for the HA heartbeat and for HA synchronization.

    This network interface must be connected directly or through a switch to the Secondary heartbeat network interfaces of other members in the HA group.

    Caution: Using the same network interface for both HA synchronization/heartbeat traffic and other network traffic could result in issues with heartbeat and synchronization during times of high traffic load, and is not recommended.

    Note: In general, you should isolate the network interfaces that are used for heartbeat traffic from your overall network. Heartbeat and synchronization packets contain sensitive configuration information, are latency-sensitive, and can consume considerable network bandwidth.

    Peer IP address

    Enter the IP address of the matching heartbeat network interface of the other member of the HA group.

    For example, if you are configuring the primary unit’s primary heartbeat network interface, enter the IP address of the secondary unit’s primary heartbeat network interface.

    Similarly, for the secondary heartbeat network interface, enter the IP address of the other unit’s secondary heartbeat network interface.

    For information about configuration synchronization and what is not synchronized, see About the heartbeat and synchronization.

    This option appears only for active-passive HA.

    Peer IPv6 address

    Enter the peer IPv6 address in the active-passive HA group. For IPv6 support, see About IPv6 Support.

    Virtual IP action

    Select whether and how to configure the IP addresses and netmasks of the FortiMail unit whose effective HA mode of operation is currently primary.

    For example, a primary unit might be configured to receive email traffic through port1 and receive heartbeat and synchronization traffic through port5 and port6. In that case, you would configure the primary unit to set the IP addresses or add virtual IP addresses for port1 of the secondary unit on failover in order to mimic that of the primary unit.

    • Ignore: Do not change the network interface configuration on failover, and do not monitor. For details on service monitoring for network interfaces, see Configuring the network interfaces.
    • Set: Add the specified virtual IP address and netmask to the network interface on failover. Normally, you will configure your network (MX records, firewall policies, routing and so on) so that clients and mail services use the virtual IP address. Both originating and reply traffic uses the virtual IP address. All replies to sessions with the virtual IP address include the virtual IP address as the source address. Originating traffic, however, will use the network interface’s actual IP address as the source address. Unlike set interface IP/netmask, this option results in the network interface having two IP addresses: the actual and the virtual. For examples, see Example: Active-passive HA group in gateway mode. In v3.0 MR2 and older releases, the behavior is different -- the originating traffic uses the actual IP address, instead of the virtual IP address.
    • Bridge: Include the network interface in the Layer 2 bridge. While the effective HA mode of operation is secondary, the interface is deactivated and cannot process traffic, preventing Layer 2 loops. Then, when the effective HA mode of operation becomes primary, the interface is activated again and can process traffic. This option appears only if the FortiMail unit is operating in transparent mode. This option is not available for Port1 and the ports not in the bridge group. For information on configuring bridging network interfaces, see Editing network interfaces.

    Note: Settings in this section are synchronizable. Configure the primary unit, then synchronize it to the secondary unit. For details, see Using high availability (HA).

    Virtual IP address

    Enter the virtual IPv4 address for this interface.

    Virtual IPv6 address

    Enter the virtual IPv6 address for this interface. For IPv6 support, see About IPv6 Support.

    Configuring service-based failover

    Go to System > High Availability > Configuration to configure remote service monitoring, local network interface monitoring, and local hard drive monitoring.

    Note

    Service monitoring is not available for config-only HA groups.

    HA service monitoring settings are not synchronized and must be configured separately on each primary and secondary unit.

    With remote service monitoring, the secondary unit confirms that it can connect to the primary unit over the network using SMTP service, POP service (POP3), and Web service (HTTP) connections. If you configure the HA pair in server mode, the IMAP service can also be checked.

    With local network interface monitoring and local hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.

    If service monitoring detects a failure, the effective HA operating mode of the primary unit switches to off or failed (depending on the On failure setting) and, if configured, the FortiMail units send HA event alert email, record HA event log messages, and send HA event SNMP traps. A failover then occurs, and the effective HA operating mode of the secondary unit switches to the primary unit. For information on the On failure option, see Configuring the HA mode and group. For information on the effective HA operating mode, see Monitoring the HA status.

    For example, if service monitoring detects that port2 on the primary unit has failed, the primary unit records a log message similar to the following.

    date=2005-11-18 time=18:20:31 device_id=FE-4002905500194 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: local problem detected (port2), shutting down"

    The primary unit also sends an alert email similar to the following:

    Subject: monitord: local problem detected (port2), shutting down [primary-host-name]

    This is the FortiMail HA unit at 10.0.0.1.

    A local problem (port2) has been detected, telling remote to take over and shutting down.

    Remote service monitoring can be effective to configure in addition to, or sometimes as a backup alternative to, the heartbeat. While the heartbeat tests for the general responsiveness of the primary unit, it does not test for the failure of individual services which email users may be using such as POP3 or webmail. The heartbeat also does not monitor for the failure of network interfaces through which non-heartbeat traffic occurs. In this way, configuring remote service monitoring provides more specific failover monitoring. Additionally, if the heartbeat link is briefly disconnected, enabling HA services monitoring can prevent a false failover by acting as a temporary secondary heartbeat. For information on treating service monitoring as a secondary heartbeat, see Remote services as heartbeat.

    To configure service monitoring
    1. Go to System > High Availability > Configuration.
    2. Select primary or secondary as the mode of operation.
    3. Expand the service monitor area, if required.
    4. Select a row in the table and click Edit to configure it.
    5. For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following:
    6. GUI item

      Description

      Enable

      Select to enable connection responsiveness tests for SMTP.

      Name

      Displays the service name.

      Remote IP

      Enter the peer IP address.

      Port

      Enter the port number of the peer SMTP service.

      Timeout

      Enter the timeout period for one connection test.

      Interval

      Enter the frequency of the tests.

      Retries

      Enter the number of consecutively failed tests that are allowed before the primary unit is deemed unresponsive and a failover occurs.

    7. For interface monitoring and local hard drive monitoring, configure the following:
    8. GUI item

      Description

      Enable

      Enable local hard drive monitoring to check if the local hard drive is still accessible, or if the mail data disk is almost full. If the hard disk is not responsive, or if the mail data disk is 95 percent full, a failover will occur.

      Interface monitoring is enabled when you configure interface monitoring. See Configuring interface monitoring.

      Network interface monitoring tests all active network interfaces whose:

      • Virtual IP action setting is not Ignore
      • Configuring interface monitoring setting is enabled

      For details, see Configuring interface monitoring and Virtual IP action.

      Interval

      Enter the frequency of the test.

      Retries

      Specify the number of consecutively failed tests that are allowed before the local interface or hard drive is deemed unresponsive and a failover occurs.

    See also

    About the heartbeat and synchronization

    About logging, alert email and SNMP in HA

    Storing mail data on a NAS server

    Configuring the HA mode and group

    Example: Active-passive HA group in gateway mode

    Example: Failover scenarios

    Example: Failover scenarios

    This section describes basic FortiMail active-passive HA failover scenarios. For each scenario, refer to the HA group shown in the following figure. To simplify the descriptions of these scenarios, the following abbreviations are used:

    • P1 is the configured primary unit.
    • S2 is the configured secondary unit.
    Example active-passive HA group

    This section contains the following HA failover scenarios:

    This topic includes:

    Failover scenario 1: Temporary failure of the primary unit

    In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.

    When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.

    Here is what happens during this process:

    1. The FortiMail HA group is operating normally.
    2. The power is accidentally disconnected from P1.
    3. S2’s primary heartbeat test detects that P1 has failed.
    4. How soon this happens depends on the HA daemon configuration of S2.

    5. The effective HA operating mode of S2 changes to primary.
    6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    7. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY heartbeat disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. 2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    Recovering from temporary failure of the primary unit

    After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure settings under System > High Availability > Configuration.

    HA On Failure settings

    • switch off

    P1 will not process email or join the HA group until you manually select the effective HA operating mode (see Using high availability (HA) and Using high availability (HA)).

    • wait for recovery then restore original role

    On recovery, P1’s effective HA operating mode resumes its configured primary role. This also means that S2 needs to give back the primary role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.

    In the case, the S2 will send out another alert email similar to the following:

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘SECONDARY asks us to switch roles (recovery after a restart)

    The state changed from ‘PRIMARY’ to ‘SECONDARY’

    After recovery, P1 also sends out an alert email similar to the following:

    This is the HA machine at 172.16.5.10.

    The following critical event was detected

    The system was shutdown!

    • wait for recovery then restore secondary role

    On recovery, P1’s effective HA operating mode becomes secondary, and S2 continues to assume the primary role. P1 then synchronizes the content of its MTA queue directories with the current primary unit, S2. S2 can then deliver email that existed in P1’s MTA queue directory at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see Using high availability (HA).

    Failover scenario 2: System reboot or reload of the primary unit

    If you need to reboot or reload (not shut down) P1 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI:

    • P1 will send a holdoff command to S2 so that S2 will not take over the primary role during P1’s reboot.
    • P1 will also send out an alert email similar to the following:

    This is the HA machine at 172.16.5.10.

    The following critical event was detected

    The system is rebooting (or reloading)!

    • S2 will hold off checking the services and heartbeat with P1. Note that S2 will only hold off for about 15 minutes. In case P1 never boots up, S2 will take over the primary role.
    • S2 will send out an alert email, indicating that S2 received the holdoff command from P1.

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘peer rebooting (or reloading)’

    The state changed from ‘SECONDARY’ to ‘HOLD_OFF’

    After P1 is up again:

    • P1 will send another command to S2 and ask S2 to change its state from holdoff to secondary and resume monitoring P1’s services and heartbeat.
    • S2 will send out an alert email, indicating that S2 received instruction commands from P1.

    This is the HA machine at 172.16.5.11.

    The following event has occurred

    ‘peer command appeared’

    The state changed from ‘HOLD_OFF’ to ‘SECONDARY’

    • S2 logs the event in the HA logs.

    Failover scenario 3: System reboot or reload of the secondary unit

    If you need to reboot or reload (not shut down) S2 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload <httpd...>, or by clicking System > Reboot from the top-right corner of the GUI, the behavior of P1 and S2 is as follows:

    • P1 will send out an alert email similar to the following, informing the administrator of the heartbeat loss with S2.

    This is the HA machine at 172.16.5.10.

    The following event has occurred

    ‘ha: SECONDARY heartbeat disappeared’

    • S2 will send out an alert email similar to the following:

    This is the HA machine at 172.16.5.11.

    The following critical event was detected

    The system is rebooting (or reloading)!

    • P1 will also log this event in the HA logs.
    Caution

    For FortiMail v4.0 and older releases:

    • P1 will not send out the alert email.
    • P1 will log the event in the HA logs.

    Failover scenario 4: System shutdown of the secondary unit

    If you shut down S2:

    • No alert email is sent out from either P1 or S2.
    • P1 will log this event in the HA logs.

    Failover scenario 5: Primary heartbeat link fails

    If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the secondary unit (S2) changes to operating as a primary unit, and both FortiMail units are acting as primary units.

    Two primary units connected to the same network may cause address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.

    Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the web-based manager of S2 to restore it as the secondary unit.

    1. The FortiMail HA group is operating normally.
    2. The heartbeat link Ethernet cable is accidently disconnected.
    3. S2’s HA heartbeat test detects that the primary unit has failed.
    4. How soon this happens depends on the HA daemon configuration of S2.

    5. The effective HA operating mode of S2 changes to primary.
    6. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    7. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY heartbeat disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    8. S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. 2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    Recovering from a heartbeat link failure

    Because the hardware failure is not permanent (that is, the failure of the heartbeat link was caused by a disconnected cable, not a failed port on one of the FortiMail units), you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.

    To return to normal operation after the heartbeat link fails
    1. Reconnect the primary heartbeat interface by reconnecting the heartbeat link Ethernet cable.
    2. Even though the effective HA operating mode of S2 is primary, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1 is also operating as a primary unit. So S2 sends a heartbeat signal to notify P1 to stop operating as a primary unit. The effective HA operating mode of P1 changes to off.

    3. P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.
    4. This is the HA machine at 172.16.5.10

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'PRIMARY' to 'OFF'

    5. P1 records the following event log messages (among others) indicating that P1 is switching to off mode.
    6. 2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering off mode"

      The configured HA mode of operation of P1 is primary and the effective HA operating mode of P1 is off.

      The configured HA mode of operation of S2 is secondary and the effective HA operating mode of S2 is primary.

      P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.

    7. Connect to the web-based manager of P1, go to System > High Availability > Status.
    8. Check for synchronization messages.
    9. Do not proceed to the next step until P1 has synchronized with S2.

    10. Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    11. The HA group should return to normal operation. P1 records the following event log message (among others) indicating that S2 asked P1 to return to operating as the primary unit.

      2005-11-30 18:10:00 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: being asked to assume original role"

    12. P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.

    Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)

    Depending on your network configuration, the network connection between the primary and secondary units can fail for a number of reasons. In the network configuration shown in Example active-passive HA group, the connection between port1 of primary unit (P1) and port1 of the secondary unit (S2) can fail if a network cable is disconnected or if the switch between P1 and S2 fails.

    A more complex network configuration could include a number of network devices between the primary and secondary unit’s non-heartbeat network interfaces. In any configuration, remote service monitoring can only detect a communication failure. Remote service monitoring cannot determine where the failure occurred or the reason for the failure.

    In this scenario, remote service monitoring has been configured to make sure that S2 can connect to P1. The On failure setting located in the HA main configuration section is wait for recovery then restore secondary role. For information on the On failure setting, see On failure. For information about remote service monitoring, see Configuring service-based failover.

    The failure occurs when power to the switch that connects the P1 and S2 port1 interfaces is disconnected. Remote service monitoring detects the failure of the network connection between the primary and secondary units. Because of the On failure setting, P1 changes its effective HA operating mode to failed.

    When the failure is corrected, P1 detects the correction because while operating in failed mode P1 has been attempting to connect to S2 using the port1 interface. When P1 can connect to S2, the effective HA operating mode of P1 changes to secondary and the mail data on P1 will be synchronized to S2. S2 can now deliver this mail. The HA group continues to operate in this manner until an administrator resets the effective HA modes of operation of the FortiMail units.

    1. The FortiMail HA group is operating normally.
    2. The power cable for the switch between P1 and S2 is accidentally disconnected.
    3. S2’s remote service monitoring cannot connect to the primary unit.
    4. How soon this happens depends on the remote service monitoring configuration of S2.

    5. Through the HA heartbeat link, S2 signals P1 to stop operating as the primary unit.
    6. The effective HA operating mode of P1 changes to failed.
    7. The effective HA operating mode of S2 changes to primary.
    8. S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    9. This is the HA machine at 172.16.5.11.

      The following event has occurred

      ‘PRIMARY remote service disappeared’

      The state changed from ‘SECONDARY’ to ‘PRIMARY’

    10. S2 logs the event (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
    11. 2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"

    12. P1 sends an alert email similar to the following, indicating that P1 has stopped operating in HA mode.
    13. This is the HA machine at 172.16.5.10.

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'PRIMARY' to 'FAILED'

    14. P1 records the following log messages (among others) indicating that P1 is switching to Failed mode.
    15. 2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"

      2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering failed mode"

    Recovering from a network connection failure

    Because the network connection failure was not caused by failure of either FortiMail unit, you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.

    To return to normal operation after the heartbeat link fails
    1. Reconnect power to the switch.
    2. Because the effective HA operating mode of P1 is failed, P1 is using remote service monitoring to attempt to connect to S2 through the switch.

    3. When the switch resumes operating, P1 successfully connects to S2.
    4. P1 has determined the S2 can connect to the network and process email.

    5. The effective HA operating mode of P1 switches to secondary.
    6. P1 logs the event.
    7. 2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"

      2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"

      2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: starting pre-amble"

      2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: ** response from peer, setting to SECONDARY mode"

    8. P1 sends an alert email similar to the following, indicating that P1 is switching its effective HA operating mode to secondary.
    9. This is the HA machine at 172.16.5.10.

      The following event has occurred

      'SECONDARY asks us to switch roles (user requested takeover)'

      The state changed from 'FAILED' to 'SECONDARY'

    10. P1 synchronizes the content of its MTA queue directories to S2. S2 can now deliver all email in these directories.
    11. The HA group can continue to operate with S2 as the primary unit and P1 as the secondary unit. However, you can use the following steps to restore each unit to its configured HA mode of operation.

    12. Connect to the web-based manager of P1 and go to System > High Availability > Status.
    13. Check for synchronization messages.
    14. Do not proceed to the next step until P1 has synchronized with S2.

    15. Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    16. Connect to the web-based manager of P1, go to System > High Availability > Status and select click HERE to restore configured operating mode.
    17. P1 should return to operating as the primary unit and S2 should return to operating as the secondary unit.

    18. P1 and S2 synchronize their MTA queue directories again. P1 can now deliver all email in these directories.

    Example: Active-passive HA group in gateway mode

    In this example, two FortiMail-400 units are configured to operate in gateway mode as an active-passive HA group.

    The procedures in this example describe HA configuration necessary to achieve this scenario. Before beginning, verify that both of the FortiMail units are already:

    Virtual IP address for HA failover

    The active-passive HA group is located on a private network with email users and the protected email server. All are behind a FortiGate unit which separates the private network from the Internet. The DNS server, remote email users, and external SMTP servers are located on the Internet.

    For both FortiMail units:

    port1

    • connected to a switch which is connected only to the computer that the FortiMail administrator uses to manage the HA group
    • administrative access occurs through this port

    port3

    • connected to a switch which is connected to the private network and, indirectly, the Internet
    • email connections occur through this port

    port6

    • connected directly to each other using a crossover cable
    • heartbeat and synchronization occurs through this port

    The secondary unit will become the new primary unit when a failover occurs. In order for it to receive the connections formerly destined for the failed primary unit, the new primary unit must adopt the failed primary unit’s IP address. You will configure an HA virtual IP address on port3 for this purpose.

    While the configured primary unit is functional, the HA virtual IP address is associated with its port3 network interface, which receives email connections. After a failover, the HA virtual IP address becomes associated with the new primary unit’s port3. As a result, after a failover, the new primary unit (originally the secondary unit) will then receive and process the email connections.

    This example contains the following topics:

    About standalone versus HA deployment

    If you plan to convert a standalone FortiMail unit to a member of an HA group, first understand the changes you need to make for HA deployment shown in Virtual IP address for HA failover in the context of its similarities and differences with a standalone deployment.

    Examine the network interface configuration of a standalone FortiMail-400 unit in the following table.

    Example standalone network interface configuration

    Network interface

    IP address

    Description

    port1

    192.168.1.5

    Administrative connections to the FortiMail unit.

    port2, port4

    Default

    Not connected.

    port3

    172.16.1.2

    Email connections to the FortiMail unit; the target of your email DNS A records (No administrative access).

    port5

    Default

    Not connected.

    port6

    Default

    Not connected.

    Similarly, for the HA group, DNS A records should target the IP address of the port3 interface of the primary FortiMail-400 unit. Additionally, administrators should administer each FortiMail unit in the HA group by connecting to the IP address of each FortiMail unit’s port1.

    If a failover occurs, the network must be able to direct traffic to port3 of the secondary unit without reconfiguring the DNS A record target. The secondary unit must cleanly and automatically substitute for the primary unit, as if they were a single, standalone unit.

    Unlike the configuration of the standalone unit, for the HA group to accomplish that substitution, all email connections must use an IP address that transfers between the primary unit and the secondary unit according to which is currently the primary unit. This transferable IP address can be accomplished by configuring the HA group to either:

    • set the IP address of the current primary unit’s network interface
    • add a virtual IP address to the current primary unit’s network interface

    In this example, the HA group uses the method of adding a virtual IP address. Email connections will not use the actual IP address of port3. Instead, all email connections will use only the virtual IP address 172.16.1.2, which is used by port3 of whichever FortiMail unit’s effective HA operating mode is currently primary. During normal HA group operation, this IP address resides on the primary unit. Conversely, after a failover occurs, this IP address resides on the former secondary unit (now the current primary unit).

    Also unlike the configuration of the standalone unit, both port5 and port6 are configured for each member of the HA group. The primary unit’s port5 is directly connected using a crossover cable to the secondary unit’s port5; the primary unit’s port6 is directly connected to the secondary unit’s port6. These links are used solely for heartbeat and synchronization traffic between members of the HA group.

    For comparison with the standalone unit, examine the network configuration of the primary unit in the following table.

    Example primary unit HA network interface configuration

    Interface

    IP/Netmask

    Virtual IP address

    Description

    Setting

    IP address

    port1

    192.168.1.5

    Ignore

    Administrative connections to this FortiMail unit.

    Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can still connect to this FortiMail unit after failover, which may be useful for diagnostic purposes.

    port2, port4

    Default

    Ignore

    Not connected.

    port3

    172.16.1.5

    Set

    172.16.1.2

    Email connections to the FortiMail unit; the target of your email DNS MX and A records. Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. No administrative access.

    port5

    10.0.1.2

    Ignore

    Secondary heartbeat and synchronization interface.

    port6

    10.0.0.2

    Ignore

    Primary heartbeat and synchronization interface.

    Because the Virtual IP action settings are synchronized between the primary and secondary units, you do not need to configure them separately on the secondary unit. However, you must configure the secondary unit with other settings listed in the following table.

    Example secondary unit HA network interface configuration

    Interface

    IP/Netmask

    Virtual IP Address

    Description

    Setting

    IP address

    port1

    192.168.1.6

    (synchronized from primary unit)

    (synchronized from primary unit)

    Administrative connections to this FortiMail unit.

    Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can connect to this FortiMail unit even when it is currently the secondary unit, which may be useful for HA configuration and log viewing.

    port2, port4

    Default

    (synchronized from primary unit)

    (synchronized from primary unit)

    Not connected.

    port3

    172.16.1.6

    (synchronized from primary unit)

    (synchronized from primary unit)

    Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. As a result, no connections should be destined for this network interface until a failover occurs, causing the secondary unit to become the new primary unit. No administrative access.

    port5

    10.0.1.4

    (synchronized from primary unit)

    (synchronized from primary unit)

    Secondary heartbeat and synchronization interface.

    port6

    10.0.0.4

    (synchronized from primary unit)

    (synchronized from primary unit)

    Primary heartbeat and synchronization interface.

    Configuring the DNS and firewall settings

    In the example shown in Virtual IP address for HA failover, SMTP clients will connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, this connection occurs through the public network virtual IP on the FortiGate unit, whose policies allow the connections and route them to the virtual IP on the current primary unit.

    Because the FortiMail HA group is installed behind a firewall performing NAT, the DNS server hosting records for the domain example.com must be configured to reflect the public IP address of the FortiGate unit, rather than the private network IP address of the HA group.

    The DNS server has been configured with:

    • an MX record to indicate that the FortiMail unit is the email gateway for example.com
    • an A record to resolve fortimail.example.com into the FortiGate unit’s public IP address
    • a reverse DNS record to enable external email servers to resolve the public IP address of the FortiGate unit into the domain name of the FortiMail unit

    Configuring the primary unit for HA operation

    The following procedure describes how to prepare a FortiMail unit for HA operation as the primary unit according to Virtual IP address for HA failover.

    In a typical standalone gateway mode configuration, you might set the IP address of the FortiMail-400 unit’s port3 network interface to 172.16.1.2. The FortiGate unit would be configured to NAT email connections to and from that IP address.

    To simulate the same configuration with the active-passive HA group, you will set the actual IP addresses of the port3 interfaces of the primary and backup units to different IP addresses. Then, in the HA options, you will add a virtual IP address of 172.16.1.2 to port3.

    Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode.

    To configure the primary unit for HA operation
    1. Connect to the web-based manager of the primary unit at https://192.168.1.5/admin.
    2. Go to System > Network > Interface.
    3. Configure port 6 to 10.0.0.2/255.255.255.0 and port 5 to 10.0.1.2/255.255.255.0.
    4. Go to System > High Availability > Configuration.
    5. Configure the following:
    6. HA Configuration section

      Mode of operation

      primary

      On failure

      wait for recovery then assume secondary role

      Shared password

      change_me

      Backup options section

      Backup mail data directories

      enabled

      Backup MTA queue directories

      disabled

      Advanced options section

      See Configuring the advanced options.

      HA base port

      2000

      Heartbeat lost threshold

      15 seconds

      Remote services as heartbeat

      disabled

      Interface section

      See Configuring interface monitoring.

      Interface

      port6

      Enable port monitor

      Enabled

      Heartbeat status

      Primary

      Peer IP address

      10.0.0.4

      Interface

      port5

      Enable port monitor

      Enabled

      Heartbeat status

      Secondary

      Peer IP address

      10.0.1.4

      Virtual IP Address

      port1

      Ignore

      port2

      Ignore

      port3

      Set

      172.16.1.2/255.255.255.0

      port4

      Ignore

      port5

      Ignore

      port6

      Ignore

    7. Click Apply.
    8. The FortiMail unit switches to active-passive HA mode, and, after determining that there is no other primary unit, sets its effective HA operating mode to primary. The virtual IP 172.16.1.2 is added to port3; if not already complete, configure DNS records and firewalls to route email traffic to this virtual IP address, not the actual IP address of the port3 network interface.

    9. To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be primary.
    10. If the effective HA operating mode is not primary, the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.

    Configuring the secondary unit for HA operation

    The following procedure describes how to prepare a FortiMail unit for HA operation as the secondary unit according to Virtual IP address for HA failover.

    Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode. Also verify that you configured the primary unit as described in Configuring the primary unit for HA operation.

    To configure the secondary unit for HA operation
    1. Connect to the web-based manager of the secondary unit at https://192.168.1.6/admin.
    2. Go to System > Network > Interface.
    3. Configure port 6 to 10.0.0.4/255.255.255.0 and port 5 to 10.0.1.4/255.255.255.0.
    4. Go to System > High Availability > Configuration.
    5. Configure the following:
    6. Main Configuration section

      See Configuring the primary HA options

      Mode of operation

      secondary

      On failure

      wait for recovery then restore secondary role

      Shared password

      change_me

      Backup options section

      Backup mail data directories

      enabled

      Backup MTA queue directories

      disabled

      Advanced options section

      See Configuring the advanced options.

      HA base port

      2000

      Heartbeat lost threshold

      15 seconds

      Remote services as heartbeat

      disabled

      Interface section

      See Configuring interface monitoring.

      Interface

      port6

      Heartbeat status

      primary

      Peer IP address

      10.0.0.2

      Interface

      port5

      Heartbeat status

      secondary

      Peer IP address

      10.0.1.2

      Virtual IP Address

      (Configuration of the ports will be synchronized with the primary unit, and are therefore not required to be configured on the secondary unit.)

      port1

      Ignore

      port2

      Ignore

      port3

      Set

      172.16.1.2/255.255.255.0

      port4

      Ignore

      port5

      Ignore

      port6

      Ignore

    7. Click Apply.
    8. The FortiMail unit switches to active-passive HA mode, and, after determining that the primary unit is available, sets its effective HA operating mode to secondary.

    9. Go to System > High Availability > Status.
    10. Select click HERE to start a configuration/data sync.
    11. The secondary unit synchronizes its configuration with the primary unit, including Virtual IP action settings that configure the HA virtual IP that the secondary unit will adopt on failover.

    12. To confirm that the FortiMail unit is acting as the secondary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be secondary.
    13. If the effective HA operating mode is not secondary, the FortiMail unit is not acting as the secondary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.

      Note

      If the heartbeat interfaces are not connected, the secondary unit cannot connect to the primary unit, and so the secondary unit will operate as though the primary unit has failed and will switch its effective HA operating mode to primary.

      When both primary unit and the secondary unit are operating in their configured mode, configuration of the active-passive HA group is complete. For information on managing both members of the HA group, see Administering an HA group.

    Administering an HA group

    In most cases, you will an HA group by connecting to the primary unit as if it were a standalone unit.

    Management tasks performed on each HA group member

    Connect to...

    For...

    Primary unit

    (192.168.1.5)

    • synchronized configuration items, such as antispam settings
    • primary unit HA management tasks, such as viewing its effective HA operating mode and configuring its HA mode and Shared password
    • viewing the log messages of the primary unit

    Secondary unit

    (192.168.1.6)

    • secondary unit HA management tasks, such as viewing its effective HA operating mode and configuring its HA mode and Shared password
    • viewing the log messages of the secondary unit

    If the initial configuration synchronization fails, such as if it is disrupted or the network cable is loose, you should manually trigger synchronization after changing the configuration of the primary unit. For information on manually triggering configuration synchronization, see Using high availability (HA).

    Note

    Some parts of the configuration are not synchronized, and must be configured separately on each member of the HA group. For details, see Configuration settings that are not synchronized.