Using high availability (HA)
Go to System > High Availability to configure the FortiMail unit to act as a member of a high availability (HA) cluster in order to increase processing capacity or availability.
For the general procedure of how to enable and configure HA, see How to use HA.
This section contains the following topics:
- About high availability
- About the heartbeat and synchronization
- About logging, alert email and SNMP in HA
- How to use HA
- Monitoring the HA status
- Configuring the HA mode and group
- Configuring service-based failover
- Example: Failover scenarios
- Example: Active-passive HA group in gateway mode
About high availability
FortiMail units can operate in one of two HA modes, active-passive or config-only.
Comparison of HA modes
2 FortiMail units in the HA group |
2-25 FortiMail units in the HA group |
Typically deployed behind a switch |
Typically deployed behind a load balancer |
Both configuration* and data synchronized |
Only configuration* synchronized |
Only primary unit processes email |
All units process email |
No data loss when hardware fails |
Data loss when hardware fails |
Failover protection, but no increased processing capacity |
Increased processing capacity, but no failover protection |
* For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.
Active-passive HA group operating in gateway mode
Config-only HA group operating in gateway mode
You can mix different FortiMail models in the same HA group. However, all units in the HA group must have the same firmware version.
When mixing FortiMail models, the HA group is limited by the capacity and configuration limits of the least powerful model. |
Communications between HA cluster members occur through the heartbeat and synchronization connection. For details, see About the heartbeat and synchronization.
To configure FortiMail units operating in HA mode, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units, so that changes made to the primary unit are propagated to the secondary units. The web-based manager of the backup unit may display “SECONDARY MODE” as a reminder that most configuration changes cannot be made through the backup unit, but instead must be made through the primary unit. For details, see “Banner” on page 35.
Exceptions to this rule include connecting to a secondary unit in order to view log messages recorded about the secondary unit itself on its own hard disk, and connecting to a secondary unit to configure settings that are not synchronized. For details, see Configuration settings that are not synchronized.
For instructions of how to enable and configure HA, see How to use HA.
See also
About the heartbeat and synchronization
About logging, alert email and SNMP in HA
Storing mail data on a NAS server
Example: Active-passive HA group in gateway mode
About the heartbeat and synchronization
Heartbeat and synchronization traffic consists of TCP packets transmitted between the FortiMail units in the HA group through the primary and secondary heartbeat interfaces.
Service monitoring traffic can also, for short periods, be used as a heartbeat. For details, see Remote services as heartbeat. |
Heartbeat and synchronization traffic has three primary functions:
- to monitor the responsiveness of the HA group members
- to synchronize configuration changes from the primary unit to the secondary units
- to synchronize mail data from the primary unit to the secondary unit (active-passive only)
For exceptions to synchronized configuration items, see Configuration settings that are not synchronized.
Mail data consists of the FortiMail system mail directory, user home directories, and mail queue.
FortiGuard Antispam packages and FortiGuard Antivirus engines and definitions are not synchronized between primary and secondary units. |
When the primary unit’s configuration changes, it immediately synchronizes the change to the secondary unit (or, in a config-only HA group, to the peer units) through the primary heartbeat interface. If this fails, or if you have inadvertently de-synchronized the secondary unit’s configuration, you can manually initiate synchronization. For details, see Using high availability (HA). You can also use the CLI command diagnose system ha sync
on either the primary unit or the secondary unit to manually synchronize the configuration. For details, see the FortiMail CLI Reference.
During normal operation, the secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal interrupts the HA group, and, if it is active-passive in style, generally triggers a failover. For details, see Failover scenario 1: Temporary failure of the primary unit.
Exceptions include system restarts and the execute reload
CLI command. In case of a system reboot or reload of the primary unit, the primary unit signals the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.
Periodically, the secondary unit checks with the primary unit to see if there are any configuration changes on the primary unit. If there are configuration changes, the secondary unit will pull the configuration changes from the primary unit, generate a new configuration, and reload the new configuration. In this case, both the primary and secondary units send alert email. For details, see Failover scenario 3: System reboot or reload of the secondary unit.
Behavior varies by your HA mode when the heartbeat fails:
- Active-passive HA
A new primary unit is elected: the secondary unit becomes the new primary unit and assumes the duty of processing of email. During the failover, no mail data or configuration changes are lost, but some in-progress email deliveries may be interrupted. These interrupted deliveries may need to be restarted, but most email clients and servers can gracefully handle this. Additional failover behaviors may be configured. For details, see On failure.
Maintain the heartbeat connection. If the heartbeat is accidentally interrupted for an active-passive HA group, such as when a network cable is temporarily disconnected, the secondary unit will assume that the primary unit has failed, and become the new primary unit. If no failure has actually occurred, both FortiMail units will be operating as primary units simultaneously. For details on correcting this, see Using high availability (HA). |
- Config-only HA
Each secondary unit continues to operate normally. However, with no primary unit, changes to the configuration are no longer synchronized. You must manually configure one of the secondary units to operate as the primary unit, synchronizing its changes to the remaining secondary units.
For failover examples and steps required to restore normal operation of the HA group in each case, see Example: Failover scenarios.
HA default ports and protocols
The following default ports are used for HA heartbeat and synchronization. In case you have a firewall in between the primary and secondary units, make sure the following ports are allowed in your firewall policies:
UDP/20000 |
Base port for HA heartbeat signal |
UDP/20001 |
Synchronization control |
TCP/20002 |
File synchronization |
TCP/20003 |
Data synchronization |
TCP/20004 |
Checksum synchronization |
TCP/25 |
HA service monitoring - remote SMTP |
TCP/80 |
HA service monitoring - remote HTTP |
TCP/110 |
HA service monitoring - remote POP3 |
TCP/143 |
HA service monitoring - remote IMAP |
See also
Configuration settings that are not synchronized
Synchronization of MTA queue directories after a failover
About logging, alert email and SNMP in HA
Storing mail data on a NAS server
Configuring the HA mode and group
Configuring service-based failover
Example: Active-passive HA group in gateway mode
Configuration settings that are not synchronized
All configuration settings on the primary unit are synchronized to the secondary unit, except the following:
HA settings not synchronized
Operation mode |
You must set the operation mode (gateway, transparent, or server) of each HA group member before configuring HA. |
||
Host name |
The host name distinguishes members of the cluster. For details, see Host name. |
||
Static route |
Static routes are not synchronized because the HA units may be in different networks (see Configuring static routes ). |
||
(gateway and server mode only) |
Each FortiMail unit in the HA group must be configured with different network interface settings for connectivity purposes. For details, see Configuring the network interfaces. Exceptions include some active-passive HA settings which affect the interface configuration for failover purposes. These settings are synchronized. For details, see Virtual IP Address. |
||
Management IP address (transparent mode only) |
Each FortiMail unit in the HA group should be configured with different management IP addresses for connectivity purposes. For details, see About the management IP. |
||
SNMP system information |
Each FortiMail unit in the HA group will have its own SNMP system information, including the Description, Location, and Contact. For details, see Configuring the network interfaces. |
||
RAID configuration |
RAID settings are hardware-dependent and determined at boot time by looking at the drives (for software RAID) or the controller (hardware RAID), and are not stored in the system configuration. Therefore, they are not synchronized. |
||
Main HA configuration |
The main HA configuration, which includes the HA mode of operation (such as primary or secondary), is not synchronized because this configuration must be different on the primary and secondary units. For details, see Configuring the HA mode and group. |
||
HA Daemon configuration |
The following HA daemon settings are not synchronized:
You must add the shared HA password to each unit in the HA group. All units in the HA group must use the same shared password to identify the group. Since the mail data and MTA queue backup settings are not synchronized, to use this feature, you must enable it on both the primary and secondary units. Synchronized HA daemon options that are active-passive HA settings affect how often the secondary unit tests the primary unit and how the secondary unit synchronizes configuration and mail data. Because HA daemon settings on the secondary unit control how the HA daemon operates, in a functioning HA group you would change the HA daemon configuration on the secondary unit to change how the HA daemon operates. The HA daemon settings on the primary unit do not affect the operation of the HA daemon. |
||
HA service monitoring configuration |
In active-passive HA, the HA service monitoring configuration is not synchronized. The remote service monitoring configuration on the secondary unit controls how the secondary unit checks the operation of the primary unit. The local services configuration on the primary unit controls how the primary unit tests the operation of the primary unit. For details, see Configuring service-based failover. Note: You might want to have a different service monitoring configuration on the primary and secondary units. For example, after a failover you may not want service monitoring to operate until you have fixed the problems that caused the failover and have restarted normal operation of the HA group. |
||
Product name and icon |
The product names and icons under System > Customization > Appearance are not synchronized. All other appearance settings are synchronized. |
||
Config-only HA |
In config-only HA, the following settings are not synchronized:
|
See also
About the heartbeat and synchronization
Synchronization of MTA queue directories after a failover
During normal operation, email messages are in one of three states:
- being received or sent by the primary unit
- waiting to be delivered in the mail queue
- stored on the primary unit’s mail data directories (email quarantines, email archives, and email inboxes of server mode)
When normal operation of an active-passive HA group is interrupted and a failover occurs, sending and receiving is interrupted. The delivery attempt fails, and the sender usually retries to send the email message. However, stored messages remain in the primary unit’s mail data directories.
You usually should configure HA to synchronize the stored mail data to prevent loss of email messages, but you usually will not want to regularly synchronize the mail queue. This is because, to prevent loss of email messages in the failed primary unit, FortiMail units in active-passive HA use the following failover mechanism:
If the failed primary unit effective HA operating mode is failed, a sequence similar to the following occurs automatically when the problem that caused the failure is corrected. |
- The secondary unit detects the failure of the primary unit, and becomes the new primary unit.
- The former primary unit restarts, detects the new primary unit, and becomes a secondary unit.
- The former primary unit pushes its mail queue to the new primary unit.
- The new primary unit delivers email in its mail queues, including email messages synchronized from the new secondary unit.
You may have to manually restart the failed primary unit. |
This synchronization occurs through the heartbeat link between the primary and secondary units, and prevents duplicate email messages from forming in the primary unit’s mail queue.
As a result, as long as the failed primary unit can restart, no email is lost from the mail queue.
Even if you choose to synchronize the mail queue, because its contents change very rapidly and synchronization is periodic, there is a chance that some email in these directories will not be synchronized at the exact moment a failover occurs.
See also
About the heartbeat and synchronization
About logging, alert email and SNMP in HA
To configure logging and alert email, configure the primary unit and enable HA events. When the configuration changes are synchronized to the secondary units, all FortiMail units in the HA group record their own separate log messages and send separate alert email messages. Log data is not synchronized. For details on configuring logging and viewing log messages, see Logs, reports and alerts.
To distinguish alert email from each member of the HA cluster, configure a different host name for each member. For details, see Host name. |
To use SNMP, configure each cluster member separately and enable HA events for the community. If you enable SNMP for all units, they can all send SNMP traps. Additionally, you can use an SNMP server to monitor the primary and secondary units for HA settings, such as the HA configured and effective mode of operation. For details on SNMP, see Configuring the network interfaces.
To aid in quick discovery and diagnosis of network problems, consider configuring SNMP, Syslog, and/or alert email to monitor the HA cluster for failover messages. |
See also
Getting HA information using SNMP
About the heartbeat and synchronization
Getting HA information using SNMP
You can use an SNMP manager to get information about how FortiMail HA is operating. The FortiMail MIB (fortimail.mib) and the FortiMail trap MIB (fortimail.trap.mib) include the HA fields listed below.
FortiMail MIB fields
MIB Field |
Description |
|
fortimail.mib |
fmlHAEventId |
Provides the ID of the most recent HA event. |
fmlHAUnitIp |
Provides the IP address of the port1 interface of the FortiMail unit on which an HA event occurred. |
fmlHAEventReason |
Provides the description of the reason for the HA event. |
fmlHAMode |
Provides the HA configured mode of operation that you configured the FortiMail unit to operate in (either as primary or secondary). |
fmlHAEffectiveMode |
Provides the effective HA mode of operation (applies to active-passive HA only), either as the primary unit or as the secondary unit. The effective HA mode of operation matches the configured mode of operation unless a failure has occurred. |
|
fortimail.trap.mib |
fmlTrapHAEvent |
Provides the FortiMail HA trap that is sent when an HA event occurs. This trap includes the contents of the |
How to use HA
In general, to enable and configure HA, you should perform the following:
- If the HA cluster will use FortiGuard Antivirus and/or FortiGuard Antispam services, license all FortiMail units in the HA group for the FortiGuard Antispam and FortiGuard Antivirus services, and register them with the Fortinet Technical Support web site, https://support.fortinet.com/.
- Physically connect the FortiMail units that will be members of the HA cluster.
- For config-only clusters, configure each member of the cluster to store mail data on a NAS server that supports NFS connections (active-passive groups may also use a NAS server, but do not require it). For details, see Selecting the mail data storage location.
- On each member of the cluster:
You must connect at least one of their network interfaces for heartbeat and synchronization traffic between members of the cluster. For reliability reasons, Fortinet recommends that you connect both a primary and a secondary heartbeat interface, and that they be connected directly or through a dedicated switch that is not connected to your overall network.
- Enable the HA mode that you want to use (either active-passive or config-only) and select whether the individual member will act as a primary unit or secondary unit within the cluster. For information about the differences between the HA modes, see About high availability.
- Configure the local IP addresses of the primary and secondary heartbeat and synchronization network interfaces.
- For active-passive clusters, configure the behavior on failover, and how the network interfaces should be configured for whichever FortiMail unit is currently acting as the primary unit. Additionally, if the FortiMail units store mail data on a NAS, disable mail data synchronization between members.
- For config-only clusters, if the FortiMail unit is a primary unit, configure the IP addresses of its secondary units; if the FortiMail unit is a secondary unit, configure the IP address of its primary unit.
For details, see Configuring the HA mode and group.
See also
About the heartbeat and synchronization
Centrally monitoring the HA cluster
Monitoring the HA status
The Status tab in the High Availability submenu shows the configured HA mode of operation of a FortiMail unit in an HA group. You can also manually initiate synchronization and reset the HA mode of operation. A reset may be required if a FortiMail unit’s effective HA mode of operation differs from its configured HA mode of operation, such as after a failover when a configured primary unit is currently acting as a secondary unit.
For FortiMail units operating as secondary units, the Status tab also lets you view the status and schedule of the HA synchronization daemon.
Appearance of the Status tab varies by:
- whether the HA group is active-passive or config-only
- whether the FortiMail unit is configured as a primary unit or secondary unit
- whether a failover has occurred (active-passive only)
If HA is disabled, this tab displays:
HA mode is currently disabled
Before you can use the Status tab, you must first enable and configure HA. For details, see Configuring the HA mode and group.
To view the HA mode of operation status, go System > High Availability > Status.
Viewing HA status
GUI item |
Description |
---|---|
Configured Operating Mode |
Displays the HA operating mode that you configured, either:
For information on configuring the HA operating mode, see HA mode. After a failure, the FortiMail unit may not be acting in its configured HA operating mode. For details, see Using high availability (HA). |
Effective Operating Mode |
Displays the mode that the unit is currently operating in, either:
The configured HA operating mode matches the effective operating mode unless a failure has occurred. For example, after a failover, a FortiMail unit configured to operate as a secondary unit could be acting as a primary unit. For explanations of combinations of configured and effective HA modes of operation, see Monitoring the HA status.For information on restoring the FortiMail unit to an effective HA operating mode that matches the configured operating mode, see Using high availability (HA). This option appears only if the FortiMail unit is a member of an active-passive HA group. |
Detail Status |
This table is viewable, when HA is configured, by all HA units (primary/secondary, and config-primary/config-secondary):
Monitoring occurs through the heartbeat link between the primary and secondary units. For details, see HA base port. |
Action |
Displays the actions you can take, depending on the context:
|
Combinations of configured and effective HA modes of operation
Configured operating mode |
Effective operating mode |
Description |
Primary |
Primary |
Normal for the primary unit of an active-passive HA group. |
Secondary |
Secondary |
Normal for the secondary unit of an active-passive HA group. |
Primary |
Off |
The primary unit has experienced a failure, or the FortiMail unit is in the process of switching to operating in HA mode. HA processes and email processing are stopped. |
Secondary |
Off |
The secondary unit has detected a failure, or the FortiMail unit is in the process of switching to operating in HA mode. After the secondary unit starts up and connects with the primary unit to form an HA group, the first configuration synchronization may fail in special circumstances. To prevent both the secondary and primary units from simultaneously acting as primary units, the effective HA mode of operation becomes off. If subsequent synchronization fails, the secondary unit’s effective HA mode of operation becomes primary. |
Primary |
Failed |
The remote service monitoring or local network interface monitoring on the primary unit has detected a failure, and will attempt to connect to the other FortiMail unit. If the problem that caused the failure has been corrected, the effective HA mode of operation switches from failed to secondary, or to match the configured HA mode of operation, depending on the On failure setting. Additionally, f the HA group is operating in transparent mode, and if the effective HA mode of operation changes to failed, the network interface IP/netmask on the secondary unit displays bridging (waiting for recovery). For details, see Configuring the network interfaces. |
Primary |
Secondary |
The primary unit has experienced a failure but then returned to operation. When the failure occurred, the unit configured to be the secondary unit became the primary unit. When the unit configured to be the primary unit restarted, it detected the new primary unit and so switched to operating as the secondary unit. |
Secondary |
Primary |
The secondary unit has detected that the FortiMail unit configured to be the primary unit failed. When the failure occurred, the unit configured to be the secondary unit became the primary unit. |
Config primary |
N/A |
Normal for the primary unit of a config-only HA group. |
Config secondary |
N/A |
Normal for the secondary unit of a config-only HA group. |
About the heartbeat and synchronization
About logging, alert email and SNMP in HA
Storing mail data on a NAS server
Configuring the HA mode and group
Configuring service-based failover
Example: Active-passive HA group in gateway mode
Restarting the HA processes on a stopped primary unit
If you configured service monitoring on an active-passive HA group (see Configuring service-based failover) and either the primary unit or the secondary unit detects a service failure on the primary unit, the primary unit changes its effective HA mode of operation to off, stops processing email, and halts all of its HA processes.
After resolving the problem that caused the failure, you can use the following steps to restart the HA processes on the primary unit.
In this example, resolving this problem could be as simple as reconnecting the cable to the port2 network interface. Once the problem is resolved, use the following steps to restart the stopped primary unit.
To restart a stopped primary unit
- Log in to the web-based manager of the primary unit.
- Go to System > High Availability > Status.
- Under Action, click Restart the HA system.
The primary unit restarts and rejoins the HA group.
If a failover has occurred due to processes being stopped on the primary unit, and the secondary unit is currently acting as the primary unit, you can restore the primary and secondary units to acting in their configured roles. For details, see Using high availability (HA).
See also
Configuring service-based failover
Example: Active-passive HA group in gateway mode
Configuring the HA mode and group
The Configuration tab in the System > High Availability submenu lets you configure the high availability (HA) options, including:
- enabling HA
- selecting whether the HA group is active-passive or config-only in style
- whether this individual FortiMail unit will act as a primary unit or a secondary unit in the cluster
- network interfaces that will be used for heartbeat and synchronization
- service monitor
For config-only HA, if the FortiMail unit is operating in server mode, you must store mail data externally, on a NAS server. Failure to store mail data externally could result in mailboxes and other data scattered over multiple FortiMail units. For details on configuring NAS, see Storing mail data on a NAS server and Selecting the mail data storage location. |
For an explanation of active-passive and config-only, see About high availability.
HA settings, with the exception of Virtual IP Address settings, are not synchronized and must be configured separately on each primary and secondary unit.
You must maintain the physical link between the heartbeat and synchronization network interfaces. These connections enable cluster members to detect the responsiveness of other members, and to synchronize data. If they are interrupted, normal operation will be interrupted and, for active-passive HA groups, a failover will occur. For more information on heartbeat and synchronization, see About the heartbeat and synchronization.
For an active-passive HA group, or a config-only HA group consisting of only two FortiMail units, directly connect the heartbeat network interfaces using a crossover Ethernet cable. For a config-only HA group consisting of more than two FortiMail units, connect the heartbeat network interfaces through a switch, and do not connect this switch to your overall network.
To configure HA options
- Go to System > High Availability > Configuration.
- Configure the following sections, as applicable:
The appearance of sections and the options in them options vary greatly with your choice in the Mode of operation drop-down-list.
- Configuring the primary HA options
- Configuring the primary configuration IP
- Configuring the advanced options
- Configuring the secondary system options
- Storing mail data on a NAS server
- Configuring interface monitoring
- Configuring service-based failover
Configuring the primary HA options
Go to System > High Availability > Configuration and click the arrow to expand the HA configuration section, if needed. The options presented vary greatly depending on your choice in the Mode of operation drop-down-list.
HA main options
GUI item |
Description |
Enables or disables HA, selects active-passive or config-only HA, and selects the initial configured role this FortiMail unit in the HA group.
|
|
Select one of the following behaviors of the primary unit when it detects a failure, such as on a power failure or from service/interface monitoring.
In most cases, you should select the wait for recovery then restore secondary role option. This option appears only if HA mode is primary. |
|
Enter an HA password for the HA group. You must configure the same Shared password value on both the primary and secondary units. |
|
Enable centralized monitor |
Enable or disable the central statistics service. Once enabled, administrators on the primary HA unit can monitor the state and activity of each HA cluster member, including CPU, memory, and disk usage, email throughput, and other statistic summaries. This feature can also be enabled in the CLI by enabling For more information, see Centrally monitoring the HA cluster. |
Configuring the primary configuration IP
If you are configuring the unit as the secondary unit in a config-only group, go to System > High Availability > Configuration to configure the primary IP address.
In the Primary IP address field, enter the IP of the primary heartbeat network interface of the primary unit. The secondary unit synchronizes only with this primary unit’s IP address.
Configuring the advanced options
Go to System > High Availability > Configuration to configure the advanced options.
For config-only groups, just the HA base port option appears.
HA advanced options
GUI item |
Description |
Synchronize mail data directory |
Synchronize system quarantine, email archives, email users’ mailboxes (server mode only), preferences, and per-recipient quarantines. Unless the HA cluster stores its mail data on a NAS server, you should configure the HA cluster to synchronize mail directories. If mail data changes frequently, you can manually initiate a data synchronization when significant changes are complete. For details, see Using high availability (HA). |
Synchronize the mail queue of the FortiMail unit. For more information on the mail queue, see Managing the mail queue. Caution: If the primary unit experiences a hardware failure and you cannot restart it, and if this option is disabled, MTA queue directory data could be lost. Note: Enabling this option can affect the FortiMail unit’s performance, because periodic synchronization of the mail queue can be processor and bandwidth-intensive. Additionally, because the content of the MTA queue directories is very dynamic, periodically synchronizing MTA queue directories between FortiMail units may not guarantee against loss of all email in those directories. Even if MTA queue directory synchronization is disabled, after a failover, a separate synchronization mechanism may successfully prevent loss of MTA queue data. For details, see Synchronization of MTA queue directories after a failover. |
|
Enter the first of four TCP port numbers that will be used for:
Note: For active-passive groups, in addition or alternatively to configuring the heartbeat, you can configure service monitoring. For details, see Configuring service-based failover. Note: In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Using high availability (HA). |
|
Enter the total span of time, in seconds, for which the primary unit can be unresponsive before it triggers a failover and the secondary unit assumes the role of the primary unit. The heartbeat will continue to check for availability once per second. To prevent premature failover when the primary unit is simply experiencing very heavy load, configure a total threshold of three (3) seconds or more to allow the secondary unit enough time to confirm unresponsiveness by sending additional heartbeat signals. Note: If the failure detection time is too short, the secondary unit may falsely detect a failure when during periods of high load. Caution: If the failure detection time is too long the primary unit could fail and a delay in detecting the failure could mean that email is delayed or lost. Decrease the failure detection time if email is delayed or lost because of an HA failover. |
|
Enable to use remote service monitoring as a secondary HA heartbeat. If enabled and both the primary and secondary heartbeat links fail or become disconnected, if remote service monitoring still detects that the primary unit is available, a failover will not occur. Note: The remote service check is only applicable for temporary heartbeat link fails. If the HA process restarts due to system reboot or HA daemon reboot, physical heartbeat connections will be checked first. If the physical connections are not found, the remote service monitoring does not take effect anymore. Note: Using remote services as heartbeat provides HA heartbeat only, not synchronization. To avoid synchronization problems, you should not use remote service monitoring as a heartbeat for extended periods. This feature is intended only as a temporary heartbeat solution that operates until you reestablish a normal primary or secondary heartbeat link. |
Configuring the secondary system options
This section appears only when the mode of operation is set to config-primary under System > High Availability > Configuration.
HA peer options
GUI item |
Description |
IP address |
Double-click in order to modify, then enter the IP address of the primary network interface on that secondary unit. |
Create |
Click to add a secondary unit to the list of Peer systems, then double-click its IP address. The primary unit synchronizes only with secondary units in the list of Peer systems. |
Delete |
Click the row corresponding to a peer IP address, then click this button to remove that secondary unit from the HA group. |
See also
About the heartbeat and synchronization
About logging, alert email and SNMP in HA
Storing mail data on a NAS server
Configuring service-based failover
Example: Active-passive HA group in gateway mode
Storing mail data on a NAS server
For FortiMail units operating in server mode as a config-only HA group, you must store mail data on a NAS server instead of locally. If mail data is stored locally, email users’ messages and other mail data could be scattered across multiple FortiMail units.
Even if your FortiMail units are not operating in server mode with config-only HA, however, storing mail data on a NAS server may have a number of benefits for your organization. For example, backing up your NAS server regularly can help prevent loss of mail data. Also, if your FortiMail unit experiences a temporary failure, you can still access the mail data on the NAS server. When the FortiMail unit restarts, it can usually continue to access and use the mail data stored on the NAS server.
For config-only HA groups using a network attached storage (NAS) server, only the primary unit sends quarantine reports to email users. The primary unit also acts as a proxy between email users and the NAS server when email users use FortiMail webmail to access quarantined email and to configure their own Bayesian filters.
For a active-passive HA groups, the primary unit reads and writes all mail data to and from the NAS server in the same way as a standalone unit. If a failover occurs, the new primary unit uses the same NAS server for mail data. The new primary unit can access all mail data that the original primary unit stored on the NAS server. So if you are using a NAS server to store mail data, after a failover, the new primary unit continues operating with no loss of mail data.
If the FortiMail unit is a member of an active-passive HA group, and the HA group stores mail data on a remote NAS server, disable mail data synchronization to prevent duplicate mail data traffic. |
For instructions on storing mail data on a NAS server, see Selecting the mail data storage location.
See also
About the heartbeat and synchronization
Configuring the HA mode and group
Configuring interface monitoring
In active-passive HA mode, Interface monitor checks the local interfaces on the primary unit. If a malfunctioning interface is detected, a failover will be triggered.
To configure interface monitoring
- Go to System > High Availability > Configuration.
- Select primary or secondary as the mode of operation.
- Expand the Interface area, if required.
- Click on the port/interface name to configure the interface. For details, see Configuring the network interfaces.
- Remote peer IP (for active-passive groups)
- Primary configuration (for secondary units in config-only groups)
- Select a row in the table and click Edit to configure the following HA settings on the interface.
The interface IP address must be different from, but on the same subnet as, the IP addresses of the other heartbeat network interfaces of other members in the HA group. When configuring other FortiMail units in the HA group, use this value as the: Peer systems (for the primary unit on config-only groups) |
GUI item |
Description |
Port |
Displays the interface name you’re configuring. |
Enable port monitor |
Enable to monitor a network interface for failure. If the port fails, the primary unit will trigger a failover. |
Heartbeat status |
Specify if this interface will be used for HA heartbeat and synchronization.
Do not use this interface for HA heartbeat and synchronization.
Select the primary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization. This network interface must be connected directly or through a switch to the Primary heartbeat network interface of other members in the HA group.
Select the secondary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization. The secondary heartbeat interface is the backup heartbeat link between the units in the HA group. If the primary heartbeat link is functioning, the secondary heartbeat link is used for the HA heartbeat. If the primary heartbeat link fails, the secondary link is used for the HA heartbeat and for HA synchronization. This network interface must be connected directly or through a switch to the Secondary heartbeat network interfaces of other members in the HA group. Caution: Using the same network interface for both HA synchronization/heartbeat traffic and other network traffic could result in issues with heartbeat and synchronization during times of high traffic load, and is not recommended. Note: In general, you should isolate the network interfaces that are used for heartbeat traffic from your overall network. Heartbeat and synchronization packets contain sensitive configuration information, are latency-sensitive, and can consume considerable network bandwidth. |
Peer IP address |
Enter the IP address of the matching heartbeat network interface of the other member of the HA group. For example, if you are configuring the primary unit’s primary heartbeat network interface, enter the IP address of the secondary unit’s primary heartbeat network interface. Similarly, for the secondary heartbeat network interface, enter the IP address of the other unit’s secondary heartbeat network interface. For information about configuration synchronization and what is not synchronized, see About the heartbeat and synchronization. This option appears only for active-passive HA. |
Peer IPv6 address |
Enter the peer IPv6 address in the active-passive HA group. For IPv6 support, see About IPv6 Support. |
Select whether and how to configure the IP addresses and netmasks of the FortiMail unit whose effective HA mode of operation is currently primary. For example, a primary unit might be configured to receive email traffic through port1 and receive heartbeat and synchronization traffic through port5 and port6. In that case, you would configure the primary unit to set the IP addresses or add virtual IP addresses for port1 of the secondary unit on failover in order to mimic that of the primary unit.
Note: Settings in this section are synchronizable. Configure the primary unit, then synchronize it to the secondary unit. For details, see Using high availability (HA). |
|
Enter the virtual IPv4 address for this interface. |
|
Virtual IPv6 address |
Enter the virtual IPv6 address for this interface. For IPv6 support, see About IPv6 Support. |
Configuring service-based failover
Go to System > High Availability > Configuration to configure remote service monitoring, local network interface monitoring, and local hard drive monitoring.
Service monitoring is not available for config-only HA groups. |
HA service monitoring settings are not synchronized and must be configured separately on each primary and secondary unit.
With remote service monitoring, the secondary unit confirms that it can connect to the primary unit over the network using SMTP service, POP service (POP3), and Web service (HTTP) connections. If you configure the HA pair in server mode, the IMAP service can also be checked.
With local network interface monitoring and local hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.
If service monitoring detects a failure, the effective HA operating mode of the primary unit switches to off or failed (depending on the On failure setting) and, if configured, the FortiMail units send HA event alert email, record HA event log messages, and send HA event SNMP traps. A failover then occurs, and the effective HA operating mode of the secondary unit switches to the primary unit. For information on the On failure option, see Configuring the HA mode and group. For information on the effective HA operating mode, see Monitoring the HA status.
For example, if service monitoring detects that port2 on the primary unit has failed, the primary unit records a log message similar to the following.
date=2005-11-18 time=18:20:31 device_id=FE-4002905500194 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: local problem detected (port2), shutting down"
The primary unit also sends an alert email similar to the following:
Subject: monitord: local problem detected (port2), shutting down [primary-host-name]
This is the FortiMail HA unit at 10.0.0.1.
A local problem (port2) has been detected, telling remote to take over and shutting down.
Remote service monitoring can be effective to configure in addition to, or sometimes as a backup alternative to, the heartbeat. While the heartbeat tests for the general responsiveness of the primary unit, it does not test for the failure of individual services which email users may be using such as POP3 or webmail. The heartbeat also does not monitor for the failure of network interfaces through which non-heartbeat traffic occurs. In this way, configuring remote service monitoring provides more specific failover monitoring. Additionally, if the heartbeat link is briefly disconnected, enabling HA services monitoring can prevent a false failover by acting as a temporary secondary heartbeat. For information on treating service monitoring as a secondary heartbeat, see Remote services as heartbeat.
To configure service monitoring
- Go to System > High Availability > Configuration.
- Select primary or secondary as the mode of operation.
- Expand the service monitor area, if required.
- Select a row in the table and click Edit to configure it.
- For Remote SMTP, Remote IMAP, Remote POP, and Remote HTTP services, configure the following:
- For interface monitoring and local hard drive monitoring, configure the following:
- Virtual IP action setting is not Ignore
- Configuring interface monitoring setting is enabled
GUI item |
Description |
Enable |
Select to enable connection responsiveness tests for SMTP. |
Name |
Displays the service name. |
Remote IP |
Enter the peer IP address. |
Port |
Enter the port number of the peer SMTP service. |
Timeout |
Enter the timeout period for one connection test. |
Interval |
Enter the frequency of the tests. |
Retries |
Enter the number of consecutively failed tests that are allowed before the primary unit is deemed unresponsive and a failover occurs. |
GUI item |
Description |
Enable |
Enable local hard drive monitoring to check if the local hard drive is still accessible, or if the mail data disk is almost full. If the hard disk is not responsive, or if the mail data disk is 95 percent full, a failover will occur. Interface monitoring is enabled when you configure interface monitoring. See Configuring interface monitoring. Network interface monitoring tests all active network interfaces whose: For details, see Configuring interface monitoring and Virtual IP action. |
Interval |
Enter the frequency of the test. |
Retries |
Specify the number of consecutively failed tests that are allowed before the local interface or hard drive is deemed unresponsive and a failover occurs. |
See also
About the heartbeat and synchronization
About logging, alert email and SNMP in HA
Storing mail data on a NAS server
Configuring the HA mode and group
Example: Active-passive HA group in gateway mode
Example: Failover scenarios
This section describes basic FortiMail active-passive HA failover scenarios. For each scenario, refer to the HA group shown in the following figure. To simplify the descriptions of these scenarios, the following abbreviations are used:
- P1 is the configured primary unit.
- S2 is the configured secondary unit.
Example active-passive HA group
This section contains the following HA failover scenarios:
This topic includes:
- Failover scenario 1: Temporary failure of the primary unit
- Failover scenario 2: System reboot or reload of the primary unit
- Failover scenario 3: System reboot or reload of the secondary unit
- Failover scenario 4: System shutdown of the secondary unit
- Failover scenario 5: Primary heartbeat link fails
- Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)
Failover scenario 1: Temporary failure of the primary unit
In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.
When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing email.
Here is what happens during this process:
- The FortiMail HA group is operating normally.
- The power is accidentally disconnected from P1.
- S2’s primary heartbeat test detects that P1 has failed.
- The effective HA operating mode of S2 changes to primary.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
- S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
How soon this happens depends on the HA daemon configuration of S2.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘PRIMARY heartbeat disappeared’
The state changed from ‘SECONDARY’ to ‘PRIMARY’
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"
2009-11-30 13:33:34 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"
Recovering from temporary failure of the primary unit
After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure settings under System > High Availability > Configuration.
HA On Failure settings
- switch off
P1 will not process email or join the HA group until you manually select the effective HA operating mode (see Using high availability (HA) and Using high availability (HA)).
On recovery, P1’s effective HA operating mode resumes its configured primary role. This also means that S2 needs to give back the primary role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
In the case, the S2 will send out another alert email similar to the following:
This is the HA machine at 172.16.5.11.
The following event has occurred
‘SECONDARY asks us to switch roles (recovery after a restart)
The state changed from ‘PRIMARY’ to ‘SECONDARY’
After recovery, P1 also sends out an alert email similar to the following:
This is the HA machine at 172.16.5.10.
The following critical event was detected
The system was shutdown!
On recovery, P1’s effective HA operating mode becomes secondary, and S2 continues to assume the primary role. P1 then synchronizes the content of its MTA queue directories with the current primary unit, S2. S2 can then deliver email that existed in P1’s MTA queue directory at the time of the failover. For information on manually restoring the FortiMail unit to acting in its configured HA mode of operation, see Using high availability (HA).
Failover scenario 2: System reboot or reload of the primary unit
If you need to reboot or reload (not shut down) P1 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot
or execute reload <httpd...>
, or by clicking System > Reboot from the top-right corner of the GUI:
- P1 will send a holdoff command to S2 so that S2 will not take over the primary role during P1’s reboot.
- P1 will also send out an alert email similar to the following:
This is the HA machine at 172.16.5.10.
The following critical event was detected
The system is rebooting (or reloading)!
- S2 will hold off checking the services and heartbeat with P1. Note that S2 will only hold off for about 15 minutes. In case P1 never boots up, S2 will take over the primary role.
- S2 will send out an alert email, indicating that S2 received the holdoff command from P1.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘peer rebooting (or reloading)’
The state changed from ‘SECONDARY’ to ‘HOLD_OFF’
After P1 is up again:
- P1 will send another command to S2 and ask S2 to change its state from holdoff to secondary and resume monitoring P1’s services and heartbeat.
- S2 will send out an alert email, indicating that S2 received instruction commands from P1.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘peer command appeared’
The state changed from ‘HOLD_OFF’ to ‘SECONDARY’
- S2 logs the event in the HA logs.
Failover scenario 3: System reboot or reload of the secondary unit
If you need to reboot or reload (not shut down) S2 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot
or execute reload <httpd...>
, or by clicking System > Reboot from the top-right corner of the GUI, the behavior of P1 and S2 is as follows:
- P1 will send out an alert email similar to the following, informing the administrator of the heartbeat loss with S2.
This is the HA machine at 172.16.5.10.
The following event has occurred
‘ha: SECONDARY heartbeat disappeared’
- S2 will send out an alert email similar to the following:
This is the HA machine at 172.16.5.11.
The following critical event was detected
The system is rebooting (or reloading)!
- P1 will also log this event in the HA logs.
For FortiMail v4.0 and older releases:
|
Failover scenario 4: System shutdown of the secondary unit
If you shut down S2:
- No alert email is sent out from either P1 or S2.
- P1 will log this event in the HA logs.
Failover scenario 5: Primary heartbeat link fails
If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiMail units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the secondary unit (S2) changes to operating as a primary unit, and both FortiMail units are acting as primary units.
Two primary units connected to the same network may cause address conflicts on your network because matching interfaces will have the same IP addresses. Additionally, because the heartbeat link is interrupted, the FortiMail units in the HA group cannot synchronize configuration changes or mail data changes.
Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the web-based manager of S2 to restore it as the secondary unit.
- The FortiMail HA group is operating normally.
- The heartbeat link Ethernet cable is accidently disconnected.
- S2’s HA heartbeat test detects that the primary unit has failed.
- The effective HA operating mode of S2 changes to primary.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
- S2 records the following event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
How soon this happens depends on the HA daemon configuration of S2.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘PRIMARY heartbeat disappeared’
The state changed from ‘SECONDARY’ to ‘PRIMARY’
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"
Recovering from a heartbeat link failure
Because the hardware failure is not permanent (that is, the failure of the heartbeat link was caused by a disconnected cable, not a failed port on one of the FortiMail units), you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.
To return to normal operation after the heartbeat link fails
- Reconnect the primary heartbeat interface by reconnecting the heartbeat link Ethernet cable.
- P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.
- P1 records the following event log messages (among others) indicating that P1 is switching to off mode.
- Connect to the web-based manager of P1, go to System > High Availability > Status.
- Check for synchronization messages.
- Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
- P1 and S2 synchronize their MTA queue directories. All email in these directories can now be delivered by P1.
Even though the effective HA operating mode of S2 is primary, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1 is also operating as a primary unit. So S2 sends a heartbeat signal to notify P1 to stop operating as a primary unit. The effective HA operating mode of P1 changes to off.
This is the HA machine at 172.16.5.10
The following event has occurred
'SECONDARY asks us to switch roles (user requested takeover)'
The state changed from 'PRIMARY' to 'OFF'
2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering off mode"
The configured HA mode of operation of P1 is primary and the effective HA operating mode of P1 is off.
The configured HA mode of operation of S2 is secondary and the effective HA operating mode of S2 is primary.
P1 synchronizes the content of its MTA queue directories to S2. Email in these directories can now be delivered by S2.
Do not proceed to the next step until P1 has synchronized with S2.
The HA group should return to normal operation. P1 records the following event log message (among others) indicating that S2 asked P1 to return to operating as the primary unit.
2005-11-30 18:10:00 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: being asked to assume original role"
Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)
Depending on your network configuration, the network connection between the primary and secondary units can fail for a number of reasons. In the network configuration shown in Example active-passive HA group, the connection between port1 of primary unit (P1) and port1 of the secondary unit (S2) can fail if a network cable is disconnected or if the switch between P1 and S2 fails.
A more complex network configuration could include a number of network devices between the primary and secondary unit’s non-heartbeat network interfaces. In any configuration, remote service monitoring can only detect a communication failure. Remote service monitoring cannot determine where the failure occurred or the reason for the failure.
In this scenario, remote service monitoring has been configured to make sure that S2 can connect to P1. The On failure setting located in the HA main configuration section is wait for recovery then restore secondary role. For information on the On failure setting, see On failure. For information about remote service monitoring, see Configuring service-based failover.
The failure occurs when power to the switch that connects the P1 and S2 port1 interfaces is disconnected. Remote service monitoring detects the failure of the network connection between the primary and secondary units. Because of the On failure setting, P1 changes its effective HA operating mode to failed.
When the failure is corrected, P1 detects the correction because while operating in failed mode P1 has been attempting to connect to S2 using the port1 interface. When P1 can connect to S2, the effective HA operating mode of P1 changes to secondary and the mail data on P1 will be synchronized to S2. S2 can now deliver this mail. The HA group continues to operate in this manner until an administrator resets the effective HA modes of operation of the FortiMail units.
- The FortiMail HA group is operating normally.
- The power cable for the switch between P1 and S2 is accidentally disconnected.
- S2’s remote service monitoring cannot connect to the primary unit.
- Through the HA heartbeat link, S2 signals P1 to stop operating as the primary unit.
- The effective HA operating mode of P1 changes to failed.
- The effective HA operating mode of S2 changes to primary.
- S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
- S2 logs the event (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to primary.
- P1 sends an alert email similar to the following, indicating that P1 has stopped operating in HA mode.
- P1 records the following log messages (among others) indicating that P1 is switching to Failed mode.
How soon this happens depends on the remote service monitoring configuration of S2.
This is the HA machine at 172.16.5.11.
The following event has occurred
‘PRIMARY remote service disappeared’
The state changed from ‘SECONDARY’ to ‘PRIMARY’
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: peer stop responding (heartbeat), assuming PRIMARY role"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"
2005-01-30 16:27:18 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop starting, entering PRIMARY mode"
This is the HA machine at 172.16.5.10.
The following event has occurred
'SECONDARY asks us to switch roles (user requested takeover)'
The state changed from 'PRIMARY' to 'FAILED'
2005-11-30 17:13:06 log_id=0107000000 type=event subtype=ha pri=notice user=ha ui=ha action=unknown status=success msg="monitord: remote detected problem, shutting down"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop stopping"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering off mode"
2005-11-30 17:13:16 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering failed mode"
Recovering from a network connection failure
Because the network connection failure was not caused by failure of either FortiMail unit, you may want to return both FortiMail units to operating in their configured modes when rejoining the failed primary unit to the HA group.
To return to normal operation after the heartbeat link fails
- Reconnect power to the switch.
- When the switch resumes operating, P1 successfully connects to S2.
- The effective HA operating mode of P1 switches to secondary.
- P1 logs the event.
- P1 sends an alert email similar to the following, indicating that P1 is switching its effective HA operating mode to secondary.
- P1 synchronizes the content of its MTA queue directories to S2. S2 can now deliver all email in these directories.
- Connect to the web-based manager of P1 and go to System > High Availability > Status.
- Check for synchronization messages.
- Connect to the web-based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
- Connect to the web-based manager of P1, go to System > High Availability > Status and select click HERE to restore configured operating mode.
- P1 and S2 synchronize their MTA queue directories again. P1 can now deliver all email in these directories.
Because the effective HA operating mode of P1 is failed, P1 is using remote service monitoring to attempt to connect to S2 through the switch.
P1 has determined the S2 can connect to the network and process email.
2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="configd: main loop starting, entering primary mode"
2009-11-30 16:02:08 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="backupd: main loop starting, entering primary mode"
2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: starting pre-amble"
2009-11-30 16:02:13 log_id=0107000000 type=event subtype=ha pri=information user=ha ui=ha action=unknown status=success msg="monitord: ** response from peer, setting to SECONDARY mode"
This is the HA machine at 172.16.5.10.
The following event has occurred
'SECONDARY asks us to switch roles (user requested takeover)'
The state changed from 'FAILED' to 'SECONDARY'
The HA group can continue to operate with S2 as the primary unit and P1 as the secondary unit. However, you can use the following steps to restore each unit to its configured HA mode of operation.
Do not proceed to the next step until P1 has synchronized with S2.
P1 should return to operating as the primary unit and S2 should return to operating as the secondary unit.
Example: Active-passive HA group in gateway mode
In this example, two FortiMail-400 units are configured to operate in gateway mode as an active-passive HA group.
The procedures in this example describe HA configuration necessary to achieve this scenario. Before beginning, verify that both of the FortiMail units are already:
- physically connected according to Virtual IP address for HA failover
- operating in gateway mode
- configured with the IP addresses for their port3 and port1 network interfaces according to Virtual IP address for HA failover, with the exception of the HA virtual IP address that will be configured in this example (for details, see Editing network interfaces)
- allowing HTTPS administrative access through their port1 network interfaces according to Virtual IP address for HA failover
Virtual IP address for HA failover
The active-passive HA group is located on a private network with email users and the protected email server. All are behind a FortiGate unit which separates the private network from the Internet. The DNS server, remote email users, and external SMTP servers are located on the Internet.
For both FortiMail units:
port1 |
|
port3 |
|
port6 |
|
The secondary unit will become the new primary unit when a failover occurs. In order for it to receive the connections formerly destined for the failed primary unit, the new primary unit must adopt the failed primary unit’s IP address. You will configure an HA virtual IP address on port3 for this purpose.
While the configured primary unit is functional, the HA virtual IP address is associated with its port3 network interface, which receives email connections. After a failover, the HA virtual IP address becomes associated with the new primary unit’s port3. As a result, after a failover, the new primary unit (originally the secondary unit) will then receive and process the email connections.
This example contains the following topics:
- About standalone versus HA deployment
- Configuring the DNS and firewall settings
- Configuring the primary unit for HA operation
- Configuring the secondary unit for HA operation
- Administering an HA group
About standalone versus HA deployment
If you plan to convert a standalone FortiMail unit to a member of an HA group, first understand the changes you need to make for HA deployment shown in Virtual IP address for HA failover in the context of its similarities and differences with a standalone deployment.
Examine the network interface configuration of a standalone FortiMail-400 unit in the following table.
Example standalone network interface configuration
Network interface |
IP address |
Description |
port1 |
192.168.1.5 |
Administrative connections to the FortiMail unit. |
port2, port4 |
Default |
Not connected. |
port3 |
172.16.1.2 |
Email connections to the FortiMail unit; the target of your email DNS A records (No administrative access). |
port5 |
Default |
Not connected. |
port6 |
Default |
Not connected. |
Similarly, for the HA group, DNS A records should target the IP address of the port3 interface of the primary FortiMail-400 unit. Additionally, administrators should administer each FortiMail unit in the HA group by connecting to the IP address of each FortiMail unit’s port1.
If a failover occurs, the network must be able to direct traffic to port3 of the secondary unit without reconfiguring the DNS A record target. The secondary unit must cleanly and automatically substitute for the primary unit, as if they were a single, standalone unit.
Unlike the configuration of the standalone unit, for the HA group to accomplish that substitution, all email connections must use an IP address that transfers between the primary unit and the secondary unit according to which is currently the primary unit. This transferable IP address can be accomplished by configuring the HA group to either:
- set the IP address of the current primary unit’s network interface
- add a virtual IP address to the current primary unit’s network interface
In this example, the HA group uses the method of adding a virtual IP address. Email connections will not use the actual IP address of port3. Instead, all email connections will use only the virtual IP address 172.16.1.2, which is used by port3 of whichever FortiMail unit’s effective HA operating mode is currently primary. During normal HA group operation, this IP address resides on the primary unit. Conversely, after a failover occurs, this IP address resides on the former secondary unit (now the current primary unit).
Also unlike the configuration of the standalone unit, both port5 and port6 are configured for each member of the HA group. The primary unit’s port5 is directly connected using a crossover cable to the secondary unit’s port5; the primary unit’s port6 is directly connected to the secondary unit’s port6. These links are used solely for heartbeat and synchronization traffic between members of the HA group.
For comparison with the standalone unit, examine the network configuration of the primary unit in the following table.
Example primary unit HA network interface configuration
Interface |
IP/Netmask |
Virtual IP address |
Description |
|
Setting |
IP address |
|||
port1 |
192.168.1.5 |
Ignore |
|
Administrative connections to this FortiMail unit. Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can still connect to this FortiMail unit after failover, which may be useful for diagnostic purposes. |
port2, port4 |
Default |
Ignore |
|
Not connected. |
port3 |
172.16.1.5 |
Set |
172.16.1.2 |
Email connections to the FortiMail unit; the target of your email DNS MX and A records. Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. No administrative access. |
port5 |
10.0.1.2 |
Ignore |
|
Secondary heartbeat and synchronization interface. |
port6 |
10.0.0.2 |
Ignore |
|
Primary heartbeat and synchronization interface. |
Because the Virtual IP action settings are synchronized between the primary and secondary units, you do not need to configure them separately on the secondary unit. However, you must configure the secondary unit with other settings listed in the following table.
Example secondary unit HA network interface configuration
Interface |
IP/Netmask |
Virtual IP Address |
Description |
|
Setting |
IP address |
|||
port1 |
192.168.1.6 |
(synchronized from primary unit) |
(synchronized from primary unit) |
Administrative connections to this FortiMail unit. Because the IP address does not follow the primary FortiMail unit, connections to this IP address are specific to this physical unit. Administrators can connect to this FortiMail unit even when it is currently the secondary unit, which may be useful for HA configuration and log viewing. |
port2, port4 |
Default |
(synchronized from primary unit) |
(synchronized from primary unit) |
Not connected. |
port3 |
172.16.1.6 |
(synchronized from primary unit) |
(synchronized from primary unit) |
Connections should not be destined for the actual IP address, but instead the virtual IP address (172.16.1.2) which follows the primary FortiMail unit. As a result, no connections should be destined for this network interface until a failover occurs, causing the secondary unit to become the new primary unit. No administrative access. |
port5 |
10.0.1.4 |
(synchronized from primary unit) |
(synchronized from primary unit) |
Secondary heartbeat and synchronization interface. |
port6 |
10.0.0.4 |
(synchronized from primary unit) |
(synchronized from primary unit) |
Primary heartbeat and synchronization interface. |
Configuring the DNS and firewall settings
In the example shown in Virtual IP address for HA failover, SMTP clients will connect to the virtual IP address of the primary unit. For SMTP clients on the Internet, this connection occurs through the public network virtual IP on the FortiGate unit, whose policies allow the connections and route them to the virtual IP on the current primary unit.
Because the FortiMail HA group is installed behind a firewall performing NAT, the DNS server hosting records for the domain example.com must be configured to reflect the public IP address of the FortiGate unit, rather than the private network IP address of the HA group.
The DNS server has been configured with:
- an MX record to indicate that the FortiMail unit is the email gateway for example.com
- an A record to resolve fortimail.example.com into the FortiGate unit’s public IP address
- a reverse DNS record to enable external email servers to resolve the public IP address of the FortiGate unit into the domain name of the FortiMail unit
Configuring the primary unit for HA operation
The following procedure describes how to prepare a FortiMail unit for HA operation as the primary unit according to Virtual IP address for HA failover.
In a typical standalone gateway mode configuration, you might set the IP address of the FortiMail-400 unit’s port3 network interface to 172.16.1.2. The FortiGate unit would be configured to NAT email connections to and from that IP address.
To simulate the same configuration with the active-passive HA group, you will set the actual IP addresses of the port3 interfaces of the primary and backup units to different IP addresses. Then, in the HA options, you will add a virtual IP address of 172.16.1.2 to port3.
Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode.
To configure the primary unit for HA operation
- Connect to the web-based manager of the primary unit at https://192.168.1.5/admin.
- Go to System > Network > Interface.
- Configure port 6 to 10.0.0.2/255.255.255.0 and port 5 to 10.0.1.2/255.255.255.0.
- Go to System > High Availability > Configuration.
- Configure the following:
- Click Apply.
- To confirm that the FortiMail unit is acting as the primary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be primary.
HA Configuration section |
|
|
|
Mode of operation |
primary |
|
On failure |
wait for recovery then assume secondary role |
|
Shared password |
change_me |
Backup options section |
|
|
|
Backup mail data directories |
enabled |
|
Backup MTA queue directories |
disabled |
Advanced options section |
||
|
HA base port |
2000 |
|
Heartbeat lost threshold |
15 seconds |
|
Remote services as heartbeat |
disabled |
Interface section |
||
|
Interface |
port6 |
|
Enable port monitor |
Enabled |
|
Heartbeat status |
Primary |
|
Peer IP address |
10.0.0.4 |
|
Interface |
port5 |
|
Enable port monitor |
Enabled |
|
Heartbeat status |
Secondary |
|
Peer IP address |
10.0.1.4 |
|
||
|
port1 |
Ignore |
|
port2 |
Ignore |
|
port3 |
Set 172.16.1.2/255.255.255.0 |
|
port4 |
Ignore |
|
port5 |
Ignore |
|
port6 |
Ignore |
The FortiMail unit switches to active-passive HA mode, and, after determining that there is no other primary unit, sets its effective HA operating mode to primary. The virtual IP 172.16.1.2 is added to port3; if not already complete, configure DNS records and firewalls to route email traffic to this virtual IP address, not the actual IP address of the port3 network interface.
If the effective HA operating mode is not primary, the FortiMail unit is not acting as the primary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.
Configuring the secondary unit for HA operation
The following procedure describes how to prepare a FortiMail unit for HA operation as the secondary unit according to Virtual IP address for HA failover.
Before beginning this procedure, verify that you have completed the required preparations described in Example: Active-passive HA group in gateway mode. Also verify that you configured the primary unit as described in Configuring the primary unit for HA operation.
To configure the secondary unit for HA operation
- Connect to the web-based manager of the secondary unit at https://192.168.1.6/admin.
- Go to System > Network > Interface.
- Configure port 6 to 10.0.0.4/255.255.255.0 and port 5 to 10.0.1.4/255.255.255.0.
- Go to System > High Availability > Configuration.
- Configure the following:
- Click Apply.
- Go to System > High Availability > Status.
- Select click HERE to start a configuration/data sync.
- To confirm that the FortiMail unit is acting as the secondary unit, go to System > High Availability > Status and compare the Configured Operating Mode and Effective Operating Mode. Both should be secondary.
Main Configuration section |
||
|
Mode of operation |
secondary |
|
On failure |
wait for recovery then restore secondary role |
|
Shared password |
change_me |
Backup options section |
|
|
|
Backup mail data directories |
enabled |
|
Backup MTA queue directories |
disabled |
Advanced options section |
||
|
HA base port |
2000 |
|
Heartbeat lost threshold |
15 seconds |
|
Remote services as heartbeat |
disabled |
Interface section |
||
|
Interface |
port6 |
|
Heartbeat status |
primary |
|
Peer IP address |
10.0.0.2 |
|
Interface |
port5 |
|
Heartbeat status |
secondary |
|
Peer IP address |
10.0.1.2 |
|
Virtual IP Address |
(Configuration of the ports will be synchronized with the primary unit, and are therefore not required to be configured on the secondary unit.) |
|
port1 |
Ignore |
|
port2 |
Ignore |
|
port3 |
Set 172.16.1.2/255.255.255.0 |
|
port4 |
Ignore |
|
port5 |
Ignore |
|
port6 |
Ignore |
The FortiMail unit switches to active-passive HA mode, and, after determining that the primary unit is available, sets its effective HA operating mode to secondary.
The secondary unit synchronizes its configuration with the primary unit, including Virtual IP action settings that configure the HA virtual IP that the secondary unit will adopt on failover.
If the effective HA operating mode is not secondary, the FortiMail unit is not acting as the secondary unit. Determine the cause of the failover, then restore the effective operating mode to that matching its configured HA mode of operation.
If the heartbeat interfaces are not connected, the secondary unit cannot connect to the primary unit, and so the secondary unit will operate as though the primary unit has failed and will switch its effective HA operating mode to primary. |
When both primary unit and the secondary unit are operating in their configured mode, configuration of the active-passive HA group is complete. For information on managing both members of the HA group, see Administering an HA group.
Administering an HA group
In most cases, you will an HA group by connecting to the primary unit as if it were a standalone unit.
Management tasks performed on each HA group member
Connect to... |
For... |
---|---|
Primary unit (192.168.1.5) |
|
Secondary unit (192.168.1.6) |
|
If the initial configuration synchronization fails, such as if it is disrupted or the network cable is loose, you should manually trigger synchronization after changing the configuration of the primary unit. For information on manually triggering configuration synchronization, see Using high availability (HA).
Some parts of the configuration are not synchronized, and must be configured separately on each member of the HA group. For details, see Configuration settings that are not synchronized. |