Using high availability

Go to System > High Availability to configure the FortiVoice unit to act as a high availability (HA) member in order to increase availability.

For the general procedure of how to enable and configure HA, see Enabling and configuring HA.

This section contains the following topics:

About high availability
About the heartbeat and synchronization
Enabling and configuring HA
Monitoring the HA status
Configuring service-based monitoring
Failover scenario examples:

About high availability

FortiVoice units operate in an active-passive HA mode which has the following features:

Two FortiVoice units are in the HA group.
Both configuration and data are synchronized (For exceptions to synchronized configuration items, see Unsynchronized HA settings .)
Only the primary unit processes phone calls.
There is no data loss when the hardware fails although active calls are disconnected and line appearance and extension appearance take time to restore.
Both FortiVoice units have failover protection, but no increased processing capacity.

Active-passive HA group

Same FortiVoice models must be used in the same HA group. All units in the HA group must have the same firmware version with the same hardware.

Communications between HA members occur through the heartbeat and synchronization connection. For details, see About the heartbeat and synchronization.

To configure FortiVoice units operating in HA mode, you usually connect only to the primary unit. The primary unit’s configuration is almost entirely synchronized to secondary units (slave), so that changes made to the primary unit are propagated to the secondary units.

Exceptions to this rule include connecting to a secondary unit in order to view log messages recorded about the secondary unit itself on its own hard disk, and connecting to a secondary unit to configure settings that are not synchronized. For details, see Unsynchronized HA settings .

For instructions of how to enable and configure HA, see Enabling and configuring HA.

About the heartbeat and synchronization

Heartbeat and synchronization traffic consists of TCP packets transmitted between the FortiVoice units in the HA group through the primary and secondary heartbeat interfaces.

Service monitoring traffic can also, for short periods, be used as a heartbeat. For details, see Remote services as heartbeat.

Heartbeat and synchronization traffic has three primary functions:

To monitor the responsiveness of the HA group members.
To synchronize configuration changes from the primary unit to the secondary units.
For exceptions to synchronized configuration items, see Unsynchronized HA settings .
To synchronize system and user data from the primary unit to the secondary unit.
Call data consists of the FortiVoice call detailed records, recorded calls, voicemail, call directories, fax, and voice prompts.

When the primary unit’s configuration changes, it immediately synchronizes the change to the secondary unit through the primary heartbeat interface. If this fails, or if you have inadvertently de-synchronized the secondary unit’s configuration, you can manually initiate synchronization. For details, see Click HERE to Start a Configuration/Data Sync. You can also use the CLI command diagnose system ha sync on either the primary unit or the secondary unit to manually synchronize the configuration.

During normal operation, the secondary unit expects to constantly receive heartbeat traffic from the primary unit. Loss of the heartbeat signal interrupts the HA group and generally triggers a failover. For details, see Failover scenario 1: Temporary failure of the primary unit.

Exceptions include system restarts and the execute reload CLI command. In case of a system reboot or reload of the primary unit, the primary unit signals the secondary unit to wait for the primary unit to complete the restart or reload. For details, see Failover scenario 2: System reboot or reload of the primary unit.

Periodically, the secondary unit checks with the primary unit to see if there are any configuration changes on the primary unit. If there are configuration changes, the secondary unit will pull the configuration changes from the primary unit, generate a new configuration, and reload the new configuration. In this case, both the primary and secondary units can be configured to send alert email. For details, see Failover scenario 3: System reboot or reload of the secondary unit and Configuring alert email.

Unsynchronized HA settings

All configuration settings on the primary unit are synchronized to the secondary unit, except the following:

GUI item

Description

Host name

The host name distinguishes members of the cluster.

Static route

Static routes are not synchronized because the HA units may be in different networks (see Configuring static routes ).

Interface configuration

Each FortiVoice unit in the HA group must be configured with different network interface settings for connectivity purposes. For details, see Configuring the network interfaces.

Exceptions include some active-passive HA settings which affect the interface configuration for failover purposes. These settings are synchronized.

Main HA configuration

The main HA configuration, which includes the HA mode of operation (such as master or slave), is not synchronized because this configuration must be different on the primary and secondary units. For details, see Configuring the HA mode and group.

HA service monitoring configuration

In active-passive HA, the HA service monitoring configuration is not synchronized. The remote service monitoring configuration on the secondary unit controls how the secondary unit checks the operation of the primary unit. The local services configuration on the primary unit controls how the primary unit tests the operation of the primary unit. For details, see Configuring service-based monitoring.

You might want to have a different service monitoring configuration on the primary and secondary units. For example, after a failover you may not want service monitoring to operate until you have fixed the problems that caused the failover and have restarted normal operation of the HA group.

System appearance

The appearance settings you configured under System > Configuration > Appearance are not synchronized.

Synchronization after a failover

During normal operation, extensions are in one of two states:

registered and idle
active call

When a failover occurs, active calls are interrupted and users have to reinitiate the calls. However, registered idle extensions can still make and receive phone calls without being affected.

When a failover is corrected, one of the following occurs automatically:

The secondary unit detects the failure of the primary unit, and becomes the new primary unit.
The former primary unit restarts, detects the new primary unit, and becomes a secondary unit.

You may have to manually restart the failed primary unit.

Enabling and configuring HA

In general, to enable and configure HA, you should perform the following:

Physically connect the FortiVoice units that will be members of the HA group.
You must connect at least one of their network interfaces for heartbeat and synchronization traffic between members of the group. For reliability reasons, Fortinet recommends that you connect both a primary and a secondary heartbeat interface, and that they be connected directly or through a dedicated switch that is not connected to your overall network.
On each member of the group:
- Enable the HA mode that you want to use and select whether the individual member will act as a primary unit or secondary unit. For information about the differences between the HA modes, see About high availability.
- Configure the local IP addresses of the primary and secondary heartbeat and synchronization network interfaces.
- Configure a virtual IP address that is shared by the HA group and remains the same after a failover. The virtual IP address is used to auto-provision the server IP address and the SIP trunk client IP address.
- Configure the behavior on failover, and how the network interfaces should be configured for whichever FortiVoice unit is currently acting as the primary unit.
If you want to trigger failover when hardware or a service fails, even if the heartbeat connection is still functioning, configure service monitoring. For details, see Configuring service-based monitoring.
Monitor the status of each group member. For details, see Monitoring the HA status. To monitor HA events through log messages and/or alert email, you must first enable logging of HA activity events. For details, see Configuring logging.

Monitoring the HA status

The Status tab in the High Availability submenu shows the configured HA mode of operation of a FortiVoice unit in an HA group. You can also manually initiate synchronization and reset the HA mode of operation. A reset may be required if a FortiVoice unit’s effective HA mode of operation differs from its configured HA mode of operation, such as after a failover when a configured primary unit is currently acting as a secondary unit.

For FortiVoice units operating as secondary units, the Status tab also lets you view the status and schedule of the HA synchronization daemon.

Before you can use the Status tab, you must first enable and configure HA. For details, see Enabling and configuring HA.

To view the HA mode of operation status, go to System > High Availability > Status.

GUI item

Description

HA Status

Select a time interval for refreshing the HA status page. You can also manually update the page by clicking Refresh.

Mode Status

Configured Operating Mode

Displays the HA operating mode that you configured, either:

master: Configured to be the primary unit of an active-passive group.
slave: Configured to be the secondary unit of an active-passive group.

For information on configuring the HA operating mode, see Mode of operation.

After a failure, the FortiVoice unit may not be acting in its configured HA operating mode. For details, see Effective Operating Mode.

Effective Operating Mode

Displays the mode that the unit is currently operating in, either:

master: Acting as primary unit.
slave: Acting as secondary unit.
off: For primary units, this indicates that service/interface monitoring has detected a failure and has taken the primary unit offline, triggering failover. For secondary units, this indicates that synchronization has failed once; a subsequent failure will trigger failover. For details, see On failure.
failed: Service/network interface monitoring has detected a failure and the diagnostic connection is currently determining whether the problem has been corrected or failover is required. For details, see On failure.

The configured HA operating mode matches the effective operating mode unless a failure has occurred.

For example, after a failover, a FortiVoice unit configured to operate as a secondary unit could be acting as a primary unit.

For explanations of combinations of configured and effective HA modes of operation, see Combinations of configured and effective HA modes of operation.

For information on restoring the FortiVoice unit to an effective HA operating mode that matches the configured operating mode, see Click HERE to Restore Configured Operating Mode.

Daemon Status

This option appears only for secondary units in active-passive HA groups.

Monitor

Displays the time at which the secondary unit’s HA daemon will check to make sure that the primary unit is operating correctly, and, if monitoring has detected a failure, the number of times that a failure has occurred.

Monitoring occurs through the heartbeat link between the primary and secondary units. If the heartbeat link becomes disconnected, the next time the secondary unit checks for the primary unit, the primary unit will not respond. If the maximum number of consecutive failures is reached, and no secondary heartbeat or remote service monitoring heartbeat is available, the secondary unit will change its effective HA operating mode to become the new primary unit.

For details, see HA base port.

Configuration

Displays the time at which the secondary unit’s HA daemon will synchronize the FortiVoice configuration from the primary unit to the secondary unit.

The message slave unit is currently synchronizing appears when the HA daemon is synchronizing the configuration.

For information on items that are not synchronized, see Unsynchronized HA settings .

Data

Displays the time at which the secondary unit HA daemon will synchronize mail data from the primary unit to the secondary unit.

The message slave unit is currently synchronizing appears when the HA daemon is synchronizing data.

Actions

Click HERE to Start a Configuration/Data Sync

Click to manually initiate synchronization of the configuration and call data. For information on items that are not synchronized, see Unsynchronized HA settings .

Click HERE to Restore Configured Operating Mode

Click to reset the FortiVoice unit to an effective HA operating mode that matches the FortiVoice unit’s configured operating mode.

For example, for a configured primary unit whose effective HA operating mode is now slave, after correcting the cause of the failover, you might click this option on the primary unit to restore the configured primary unit to active duty, and restore the secondary unit to its secondary role.

If the effective HA operating mode has changed due to a failover, make sure to resolve any issues that caused the failover before selecting this option.

Combinations of configured and effective HA modes of operation

Configured operating mode	Effective operating mode	Description
master	master	Normal for the primary unit of an active-passive HA group.
slave	slave	Normal for the secondary unit of an active-passive HA group.
master	off	The primary unit has experienced a failure, or the FortiVoice unit is in the process of switching to operating in HA mode. HA processes and call processing are stopped.
slave	off	The secondary unit has detected a failure, or the FortiVoice unit is in the process of switching to operating in HA mode. After the secondary unit starts up and connects with the primary unit to form an HA group, the first configuration synchronization may fail in special circumstances. To prevent both the secondary and primary units from simultaneously acting as primary units, the effective HA mode of operation becomes off. If subsequent synchronization fails, the secondary unit’s effective HA mode of operation becomes master.
master	failed	The remote service monitoring or local network interface monitoring on the primary unit has detected a failure, and will attempt to connect to the other FortiVoice unit. If the problem that caused the failure has been corrected, the effective HA mode of operation switches from failed to slave, or to match the configured HA mode of operation, depending on the On failure setting.
master	slave	The primary unit has experienced a failure but then returned to operation. When the failure occurred, the unit configured to be the secondary unit became the primary unit. When the unit configured to be the primary unit restarted, it detected the new primary unit and so switched to operating as the secondary unit.
slave	master	The secondary unit has detected that the FortiVoice unit configured to be the primary unit failed. When the failure occurred, the unit configured to be the secondary unit became the primary unit.

Configuring the HA mode and group

The Configuration tab in the System > High Availability submenu lets you configure the high availability (HA) options, including:

enabling HA
whether this individual FortiVoice unit will act as a primary unit or a secondary unit in the group
network interfaces that will be used for heartbeat and synchronization and virtual IP
service monitor

HA settings , with the exception of Virtual IP Address settings, are not synchronized and must be configured separately on each primary and secondary unit.

You must maintain the physical link between the heartbeat and synchronization network interfaces. These connections enable a group member to detect the responsiveness of the other member, and to synchronize data. If they are interrupted, normal operation will be interrupted and a failover will occur. For more information on heartbeat and synchronization, see About the heartbeat and synchronization.

You can directly connect the heartbeat network interfaces of two FortiVoice units using a crossover Ethernet cable.

To configure HA options

Go to System > High Availability > Configuration.
Configure the following sections, as applicable:
- Configuring the primary HA options
- Configuring HA advanced options
- Configuring interface monitoring
- Configuring service-based monitoring
Click Apply.

Configuring the primary HA options

Go to System > High Availability > Configuration and click the arrow to expand the HA Configuration section, if needed.

GUI field	Description
Mode of operation	Enables or disables HA, and selects the initial configured role this FortiVoice unit in the HA group. Off: The FortiVoice unit is not operating in HA mode. Master: The FortiVoice unit is the primary unit in an active-passive HA group. Slave: The FortiVoice unit is the secondary unit in an active-passive HA group.
On failure	Select one of the following behaviors of the primary unit when it detects a failure, such as on a power failure or from service/interface monitoring. Switch Off: Do not process phone calls or join the HA group until you manually select the effective operating mode (see Click HERE to Start a Configuration/Data Sync and Click HERE to Restore Configured Operating Mode). Wait for Recovery Then Restore Original Role: On recovery, the failed primary unit's effective HA mode of operation resumes its configured primary role. This also means that the secondary unit needs to give back the primary role to the primary unit. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent. Wait for Recovery Then Restore Slave Role: On recovery, the failed primary unit’s effective HA mode of operation becomes slave, and the secondary unit continues to assume the master role. The primary unit then synchronizes with the current primary unit. The new primary unit can then deliver phone calls. For information on manually restoring the FortiVoice unit to acting in its configured HA mode of operation, see Click HERE to Restore Configured Operating Mode. In most cases, you should select the Wait for Recovery Then Restore Slave Role option. For details on the effects of this option on the Effective Operating Mode, see Combinations of configured and effective HA modes of operation. For information on configuring service/interface monitoring, see Configuring service-based monitoring. This option appears only if Mode of operation is master.
Shared password	Enter an HA password for the HA group. You must configure the same Shared password value on both the primary and secondary units.

GUI field

Description

Mode of operation

Enables or disables HA, and selects the initial configured role this FortiVoice unit in the HA group.

Off: The FortiVoice unit is not operating in HA mode.
Master: The FortiVoice unit is the primary unit in an active-passive HA group.
Slave: The FortiVoice unit is the secondary unit in an active-passive HA group.

On failure

Select one of the following behaviors of the primary unit when it detects a failure, such as on a power failure or from service/interface monitoring.

Switch Off: Do not process phone calls or join the HA group until you manually select the effective operating mode (see Click HERE to Start a Configuration/Data Sync and Click HERE to Restore Configured Operating Mode).
Wait for Recovery Then Restore Original Role: On recovery, the failed primary unit's effective HA mode of operation resumes its configured primary role. This also means that the secondary unit needs to give back the primary role to the primary unit. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.
Wait for Recovery Then Restore Slave Role: On recovery, the failed primary unit’s effective HA mode of operation becomes slave, and the secondary unit continues to assume the master role. The primary unit then synchronizes with the current primary unit. The new primary unit can then deliver phone calls. For information on manually restoring the FortiVoice unit to acting in its configured HA mode of operation, see Click HERE to Restore Configured Operating Mode.

In most cases, you should select the Wait for Recovery Then Restore Slave Role option.

For details on the effects of this option on the Effective Operating Mode, see Combinations of configured and effective HA modes of operation. For information on configuring service/interface monitoring, see Configuring service-based monitoring.

This option appears only if Mode of operation is master.

Shared password

Enter an HA password for the HA group. You must configure the same Shared password value on both the primary and secondary units.

Configuring HA advanced options

Go to System > High Availability > Configuration > Advanced Options.

GUI item

Description

HA base port

Keep the default TCP port number (20000) that will be used for:

the heartbeat signal
synchronization control
data synchronization
configuration synchronization

In addition to configuring the heartbeat, you can configure service monitoring. For details, see Configuring service-based monitoring.

In addition to automatic immediate and periodic configuration synchronization, you can also manually initiate synchronization. For details, see Click HERE to Start a Configuration/Data Sync.

Heartbeat lost threshold

Enter the total span of time, in seconds, for which the primary unit can be unresponsive before it triggers a failover and the secondary unit assumes the role of the primary unit.

The heartbeat will continue to check for availability once per second. To prevent premature failover when the primary unit is simply experiencing very heavy load, configure a total threshold of three (3) seconds or more to allow the secondary unit enough time to confirm unresponsiveness by sending additional heartbeat signals.

If the failure detection time is too short, the secondary unit may falsely detect a failure during periods of high load.

If the failure detection time is too long, the primary unit could fail and a delay in detecting the failure could mean that a call is delayed or lost. Decrease the failure detection time if a call is delayed or lost because of an HA failover.

Remote services as heartbeat

Enable to use remote service monitoring as a secondary HA heartbeat. If enabled and both the primary and secondary heartbeat links fail or become disconnected, and remote service monitoring still detects that the primary unit is available, a failover will not occur.

The remote service check is only applicable for temporary heartbeat link fails. If the HA process restarts due to system reboot or HA daemon reboot, then physical heartbeat connections will be checked first. If physical connections are not found, the remote service monitoring does not take effect anymore.

Using remote services as heartbeat provides HA heartbeat only, not synchronization. To avoid synchronization problems, you should not use remote service monitoring as a heartbeat for extended periods. This feature is intended only as a temporary heartbeat solution that operates until you reestablish a normal primary or secondary heartbeat link.

Call recording sync

Select to sync recorded calls.

Survivability service interface

Select the interface port for a local survivable gateway (LSG) to communicate with this FortiVoice unit.

In an LSG setup, when the central FortiVoice HA is enabled without a virtual IP, the primary and secondary units need to identify their service interface ports for the LSG to communicate with them.

For more information about LSG, see FortiVoice Local Survivable Gateway Deployment Guide.

In any other cases, this value is ignored by the system.

Primary Override External Media Host

Enter the host/IP address to override the default external host/IP address for media stream on the primary HA unit.

Secondary Override External Media Host

Enter the host/IP address to override the default external host/IP address for media stream on the secondary HA unit.

Configuring interface monitoring

Interface monitor checks the local interfaces on the primary unit. If a malfunctioning interface is detected, a failover will be triggered.

To configure interface monitoring

Go to System > High Availability > Configuration.
Select master or slave as the mode of operation.

Expand the Interface area, if required.

The interface IP address must be different from, but on the same subnet as, the IP address of the other heartbeat network interface of the other member in the HA group.

When configuring the other FortiVoice unit in the HA group, use this value as the remote peer IP.

Select a row in the table and click Edit to configure the following HA settings on the interface.

GUI item

Description

Port

Displays the interface name you’re configuring.

Enable port monitor

Enable to monitor a network interface for failure. If the port fails, the primary unit will trigger a failover.

Heartbeat status

Specify if this interface will be used for HA heartbeat and synchronization.

Disable

Do not use this interface for HA heartbeat and synchronization.

Primary

Select the primary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

This network interface must be connected directly or through a switch to the Primary heartbeat network interface of the other member in the HA group.

Secondary

Select the secondary network interface for heartbeat and synchronization traffic. For more information, see About the heartbeat and synchronization.

The secondary heartbeat interface is the backup heartbeat link between the units in the HA group. If the primary heartbeat link is functioning, the secondary heartbeat link is used for the HA heartbeat. If the primary heartbeat link fails, the secondary link is used for the HA heartbeat and for HA synchronization.

This network interface must be connected directly or through a switch to the Secondary heartbeat network interfaces of the other member in the HA group.

Using the same network interface for both HA synchronization/heartbeat traffic and other network traffic could result in issues with heartbeat and synchronization during times of high traffic load, and is not recommended.

In general, you should isolate the network interfaces that are used for heartbeat traffic from your overall network. Heartbeat and synchronization packets contain sensitive configuration information, are latency-sensitive, and can consume considerable network bandwidth.

Peer IP address

Enter the IP address of the matching heartbeat network interface of the other member of the HA group.

For example, if you are configuring the primary unit’s primary heartbeat network interface, enter the IP address of the secondary unit’s primary heartbeat network interface.

Similarly, for the secondary heartbeat network interface, enter the IP address of the other unit’s secondary heartbeat network interface.

For information about configuration synchronization and what is not synchronized, see About the heartbeat and synchronization.

Peer IPv6 address

Enter the peer IPv6 address for this interface.

Virtual IP action

Select whether and how to configure the IP addresses and netmasks of the FortiVoice unit whose effective HA mode of operation is currently master.

For example, a primary unit might be configured to receive phone call traffic through port1 and receive heartbeat and synchronization traffic through port3 and port4. In that case, you would configure the primary unit to set the IP addresses or add virtual IP addresses for port1 of the secondary unit on failover in order to mimic that of the primary unit.

Ignore: Do not change the network interface configuration on failover, and do not monitor. For details on service monitoring for network interfaces, see Configuring service-based monitoring.
Use: Add the specified virtual IP address and netmask to the network interface on failover. Normally, you will configure your network so that clients use the virtual IP address. This option results in the network interface having two IP Addresses: the actual and the virtual.

Virtual IP address

Enter the virtual IPv4 address for this interface.

Virtual IPv6 address

Enter the virtual IPv6 address for this interface.

Click OK.

Configuring service-based monitoring

Go to System > High Availability > Configuration to configure remote service monitoring, local network interface monitoring, and local hard drive monitoring.

HA service monitoring settings are not synchronized and must be configured separately on each primary and secondary unit.

With remote service monitoring, the secondary unit confirms that it can connect to the primary unit over the network using SIP and HTTP connections.

With local network interface monitoring and local hard drive monitoring, the primary unit monitors its own network interfaces and hard drives.

If service monitoring detects a failure, the effective HA operating mode of the primary unit switches to off or failed (depending on the On failure setting). A failover then occurs, and the effective HA operating mode of the secondary unit switches to master. For information on the On failure option, see Configuring the HA mode and group. For information on the effective HA operating mode, see Monitoring the HA status.

To configure service monitoring

Go to System > High Availability > Configuration.
Select master or slave as the mode of operation.
Expand Service Monitor, if required.
Select a row in the table and click Edit to configure it.

For Remote HTTP, configure the following:

GUI item	Description
Enable	Select to enable connection responsiveness tests for SMTP.
Name	Displays the service name.
Remote IP	Enter the peer IP address.
Port	Enter the port number of the peer SMTP service.
Timeout	Enter the timeout period for one connection test.
Interval	Enter the frequency of the tests.
Retries	Enter the number of consecutively failed tests that are allowed before the primary unit is deemed unresponsive and a failover occurs.

For SIP UDP, configure the following:

GUI Item	Description
Enable	Select to enable SIP UDP service.
Name	Displays the service name.
Remote IP	Enter the peer IP address.
Port	Enter the port number of the peer SIP UDP service.
Timeout	Enter the timeout period for one connection test.
Interval	Enter the frequency of the tests.
Retries	Enter the number of consecutively failed tests that are allowed before the primary unit is deemed unresponsive and a failover occurs.

For Interface monitor and Local hard drives, configure the following:

GUI item	Description
Enable	Select to enable local hard drive monitoring. Interface monitoring is enabled when you configure interface monitoring. See Configuring interface monitoring. Network interface monitoring tests all active network interfaces whose: Virtual IP action setting is not Ignore Configuring interface monitoring setting is enabled
Interval	Enter the frequency of the test.
Retries	Specify the number of consecutively failed tests that are allowed before the local interface or hard drive is deemed unresponsive and a failover occurs.

GUI item

Description

Enable

Select to enable local hard drive monitoring. Interface monitoring is enabled when you configure interface monitoring. See Configuring interface monitoring.

Network interface monitoring tests all active network interfaces whose:

Virtual IP action setting is not Ignore
Configuring interface monitoring setting is enabled

Interval

Enter the frequency of the test.

Retries

Specify the number of consecutively failed tests that are allowed before the local interface or hard drive is deemed unresponsive and a failover occurs.

Failover scenario examples:

This section describes basic FortiVoice active-passive HA failover scenarios. For each scenario, refer to the HA group shown in Example active-passive HA group. To simplify the descriptions of these scenarios, the following abbreviations are used:

P1 is the configured primary unit.
S2 is the configured secondary unit.

Example active-passive HA group

This section contains the following HA failover scenarios:

Failover scenario 1: Temporary failure of the primary unit
Failover scenario 2: System reboot or reload of the primary unit
Failover scenario 3: System reboot or reload of the secondary unit
Failover scenario 4: System shutdown of the secondary unit
Failover scenario 5: Primary heartbeat link fails
Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)

Failover scenario 1: Temporary failure of the primary unit

In this scenario, the primary unit (P1) fails because of a software failure or a recoverable hardware failure (in this example, the P1 power cable is unplugged). HA logging and alert email are configured for the HA group.

When the secondary unit (S2) detects that P1 has failed, S2 becomes the new primary unit and continues processing phone calls.

There is no data loss when failover happens although active calls are disconnected and line appearance and extension appearance take time to restore. Call data consists of the FortiVoice call detailed records, recorded calls, voicemail, call directories, fax, and voice prompts. The user portal is not affected.

Here is what happens during this process:

The FortiVoice HA group is operating normally.
The power is accidentally disconnected from P1.
S2’s heartbeat test detects that P1 has failed.
How soon this happens depends on the HA daemon configuration of S2.
The effective HA operating mode of S2 changes to master.
S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11. The following event has occurred ‘MASTER heartbeat disappeared’ The state changed from ‘SLAVE’ to ‘MASTER’
S2 records event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.

Recovering from temporary failure of the primary unit

After P1 recovers from the hardware failure, what happens next to the HA group depends on P1’s HA On failure setting under System > High Availability > Configuration.

HA On Failure setting

Switch Off
P1 will not process calls or join the HA group until you manually select the effective HA operating mode (see Click HERE to Restore Configured Operating Mode).
Wait for Recovery Then Restore Original Role
On recovery, P1’s effective HA operating mode resumes its configured primary role. This also means that S2 needs to give back the primary role to P1. This behavior may be useful if the cause of failure is temporary and rare, but may cause problems if the cause of failure is permanent or persistent.

In the case, the S2 will send out another alert email similar to the following:

This is the HA machine at 172.16.5.11.

The following event has occurred
‘SLAVE asks us to switch roles (recovery after a restart)
The state changed from ‘MASTER’ to ‘SLAVE’

After recovery, P1 also sends out an alert email similar to the following:

This is the HA machine at 172.16.5.10.

The following critical event was detected
The system was shutdown!
wait for recovery then restore slave role
On recovery, P1’s effective HA operating mode becomes slave, and S2 continues to assume the master role. P1 then synchronizes with the current primary unit, S2. For information on manually restoring the FortiVoice unit to acting in its configured HA mode of operation, see Click HERE to Restore Configured Operating Mode.

Failover scenario 2: System reboot or reload of the primary unit

If you need to reboot or reload (not shut down) P1 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload, or by clicking the Restart button under Status > Dashboard > System Command on the GUI:

P1 will send a holdoff command to S2 so that S2 will not take over the primary role during P1’s reboot.
P1 will also send out an alert email similar to the following:
This is the HA machine at 172.16.5.10.

The following critical event was detected
The system is rebooting (or reloading)!
S2 will hold off checking the services and heartbeat with P1. Note that S2 will only hold off for about 5 minutes. In case P1 never boots up, S2 will take over the primary role.
S2 will send out an alert email, indicating that S2 received the holdoff command from P1.
This is the HA machine at 172.16.5.11.

The following event has occurred
‘peer rebooting (or reloading)’
The state changed from ‘SLAVE’ to ‘HOLD_OFF’

After P1 is up again:

P1 will send another command to S2 and ask S2 to change its state from holdoff to slave and resume monitoring P1’s services and heartbeat.
S2 will send out an alert email, indicating that S2 received instruction commands from P1.
This is the HA machine at 172.16.5.11.

The following event has occurred
‘peer command appeared’
The state changed from ‘HOLD_OFF’ to ‘SLAVE’.
S2 logs the event in the HA logs.

Failover scenario 3: System reboot or reload of the secondary unit

If you need to reboot or reload (not shut down) S2 for any reason, such as a firmware upgrade or a process restart, by using the CLI commands execute reboot or execute reload, or by clicking the Restart button under Monitor > System Status > Status on the GUI, the behavior of P1 and S2 is as follows:

P1 will send out an alert email similar to the following, informing the administrator of the heartbeat loss with S2.
This is the HA machine at 172.16.5.10.

The following event has occurred
‘ha: SLAVE heartbeat disappeared’
S2 will send out an alert email similar to the following:
This is the HA machine at 172.16.5.11.

The following critical event was detected
The system is rebooting (or reloading)!
P1 will also log this event in the HA logs.

Failover scenario 4: System shutdown of the secondary unit

If you shut down S2:

No alert email is sent out from either P1 or S2.
P1 will log this event in the HA logs.

Failover scenario 5: Primary heartbeat link fails

If the primary heartbeat link fails, such as when the cable becomes accidentally disconnected, and if you have not configured a secondary heartbeat link, the FortiVoice units in the HA group cannot verify that other units are operating and assume that the other has failed. As a result, the secondary unit (S2) changes to operating as a primary unit, and both FortiVoice units are acting as primary units.

Two primary units connected to the same network may cause address conflicts on your network. Additionally, because the heartbeat link is interrupted, the FortiVoice units in the HA group cannot synchronize configuration changes or voice data changes.

Even after reconnecting the heartbeat link, both units will continue operating as primary units. To return the HA group to normal operation, you must connect to the web‑based manager of S2 to restore its effective HA operating mode to slave (secondary unit).

The FortiVoice HA group is operating normally.
The heartbeat link Ethernet cable is accidentally disconnected.
S2’s HA heartbeat test detects that the primary unit has failed.
How soon this happens depends on the HA daemon configuration of S2.
The effective HA operating mode of S2 changes to master.
S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.

The following event has occurred
‘MASTER heartbeat disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
S2 records event log messages (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.

Recovering from a heartbeat link failure

Because the hardware failure is not permanent (that is, the failure of the heartbeat link was caused by a disconnected cable, not a failed port on one of the FortiVoice units), you may want to return both FortiVoice units to operating in their configured modes when rejoining the failed primary unit to the HA group.

To return to normal operation after the heartbeat link fails

Reconnect the primary heartbeat interface by reconnecting the heartbeat link Ethernet cable.
Even though the effective HA operating mode of S2 is master, S2 continues to attempt to find the other primary unit. When the heartbeat link is reconnected, S2 finds P1 and determines that P1 is also operating as a primary unit. So S2 sends a heartbeat signal to notify P1 to stop operating as a primary unit. The effective HA operating mode of P1 changes to off.
P1 sends an alert email similar to the following, indicating that P1 has stopped operating as the primary unit.
This is the HA machine at 172.16.5.10

The following event has occurred

'SLAVE asks us to switch roles (user requested takeover)'

The state changed from 'MASTER' to 'OFF'
P1 records event log messages (among others) indicating that P1 is switching to off mode.
The configured HA mode of operation of P1 is master and the effective HA operating mode of P1 is off.

The configured HA mode of operation of S2 is slave and the effective HA operating mode of S2 is master.
Connect to the web‑based manager of P1, go to System > High Availability > Status.
Check for synchronization messages.
Do not proceed to the next step until P1 has synchronized with S2.
Connect to the web‑based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
The HA group should return to normal operation. P1 records the event log message (among others) indicating that S2 asked P1 to return to operating as the primary unit.

P1 and S2 synchronize again. P1 processes phone calls normally.

Failover scenario 6: Network connection between primary and secondary units fails (remote service monitoring detects a failure)

Depending on your network configuration, the network connection between the primary and secondary units can fail for a number of reasons. In the network configuration shown in Example active-passive HA group, the connection between port1 of primary unit (P1) and port1 of the secondary unit (S2) can fail if a network cable is disconnected or if the switch between P1 and S2 fails.

A more complex network configuration could include a number of network devices between the primary and secondary unit’s non-heartbeat network interfaces. In any configuration, remote service monitoring can only detect a communication failure. Remote service monitoring cannot determine where the failure occurred or the reason for the failure.

In this scenario, remote service monitoring has been configured to make sure that S2 can connect to P1. The On failure setting located in the HA main configuration section is wait for recovery then restore slave role. For information on the On failure setting, see On failure. For information about remote service monitoring, see Configuring service-based monitoring.

The failure occurs when power to the switch that connects the P1 and S2 port1 interfaces is disconnected. Remote service monitoring detects the failure of the network connection between the primary and secondary units. Because of the On failure setting, P1 changes its effective HA operating mode to failed.

When the failure is corrected, P1 detects the correction because while operating in failed mode P1 has been attempting to connect to S2 using the port1 interface. When P1 can connect to S2, the effective HA operating mode of P1 changes to slave and the voice data on P1 will be synchronized to S2. S2 can now deliver the calls. The HA group continues to operate in this manner until an administrator resets the effective HA modes of operation of the FortiVoice units.

The FortiVoice HA group is operating normally.
The power cable for the switch between P1 and S2 is accidentally disconnected.
S2’s remote service monitoring cannot connect to the primary unit.
How soon this happens depends on the remote service monitoring configuration of S2.
Through the HA heartbeat link, S2 signals P1 to stop operating as the primary unit.
The effective HA operating mode of P1 changes to failed.
The effective HA operating mode of S2 changes to master.
S2 sends an alert email similar to the following, indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
This is the HA machine at 172.16.5.11.

The following event has occurred
‘MASTER remote service disappeared’
The state changed from ‘SLAVE’ to ‘MASTER’
S2 logs the event (among others) indicating that S2 has determined that P1 has failed and that S2 is switching its effective HA operating mode to master.
P1 sends an alert email similar to the following, indicating that P1 has stopped operating in HA mode.
This is the HA machine at 172.16.5.10.

The following event has occurred
'SLAVE asks us to switch roles (user requested takeover)'

The state changed from 'MASTER' to 'FAILED'
P1 records the log messages (among others) indicating that P1 is switching to Failed mode.

Recovering from a network connection failure

Because the network connection failure was not caused by failure of either FortiVoice unit, you may want to return both FortiVoice units to operating in their configured modes when rejoining the failed primary unit to the HA group.

To return to normal operation after the heartbeat link fails

Reconnect power to the switch.
Because the effective HA operating mode of P1 is failed, P1 is using remote service monitoring to attempt to connect to S2 through the switch.
When the switch resumes operating, P1 successfully connects to S2.
P1 has determined the S2 can connect to the network and process calls.
The effective HA operating mode of P1 switches to slave.
P1 logs the event.
P1 sends an alert email similar to the following, indicating that P1 is switching its effective HA operating mode to slave.
This is the HA machine at 172.16.5.10.

The following event has occurred
'SLAVE asks us to switch roles (user requested takeover)'

The state changed from 'FAILED' to 'SLAVE'
Connect to the web‑based manager of P1 and go to System > High Availability > Status.
Check for synchronization messages.
Do not proceed to the next step until P1 has synchronized with S2.
Connect to the web‑based manager of S2, go to System > High Availability > Status and select click HERE to restore configured operating mode.
Connect to the web‑based manager of P1, go to System > High Availability > Status and select click HERE to restore configured operating mode.
P1 should return to operating as the primary unit and S2 should return to operating as the secondary unit.

P1 and S2 synchronize again. P1 can now process phone calls normally.