The primary (master) node and secondary (primary slave) node send heartbeats to each other to detect if its peers are alive. If the primary node is not accessible, such as during a reboot, a failover occurs. You can also configure a ping server and/or TCP echo server to regularly check the unit's network condition and downgrade itself to secondary (primary slave) type to trigger a failover. In a failover, the secondary and primary switch roles and the cluster IP addresses change, as indicated by the boxes in the lower image.
The failover logic handles two different scenarios:
Objective node available
The objective node is a worker (slave) (either secondary or worker) that can decide the new primary. For example, if a cluster consists of one primary node, one secondary node, and one worker node, the worker node is the objective node.
After a secondary node takes over the primary role, the original primary node will accept the decision when it is back online.
After the original primary is back online, it will become a secondary node.
No Objective node available
When there is no objective node in the cluster, the cluster topography is not stable and the failover process may take several rounds of role changes. This occurs when there is no communication between nodes because the cluster's internal communication is down . During the failover process, the final roles of primary and secondary are decided by three principal factors: the internal connections, the health check and the serial number.
The internal connections in a cluster involve two ports: port1 and the cluster internal port, typically port2 depending on your configuration.
Port1 is used when a node prompts itself to be the primary and needs confirmation from other nodes.
The cluster internal port is used for cluster nodes to detect whether its connection to other nodes in the cluster is available or not, and is used to ask the secondary to failover when its health check fails.
The health check is used to check the connection with the ping server. If this connection fails in the primary node, it triggers a failover.
Once the port1 connection is recovered, the unit with the newer serial number will keep the primary role and the unit with the older serial number will become the secondary.
When the new primary is decided, it will:
- Build up the scan environment.
- Apply all the settings synchronized from the original primary except the port3 IP and the internal communication port IP of the original primary.
After a failover occurs, the original primary might become a secondary node.
It keeps its original port3 IP and internal cluster communication IP. All other interface ports are shut down as it becomes a worker node. Some functionality is turned off such as email alerts. If you want to reconfigure settings, such as the interface IP, you must do that through the CLI command or the primary node's Central Management page.
Do not change the new primary configuration before the old primary has returned online, because the configuration might be lost. If It is absolutely necessary to reconfigure the new primary, it is recommended to first remove the old primary from the cluster using the CLI command
As the new primary takes over the port that client devices communicate with will switch to it. As the new primary needs time to start up all the services, clients may experience a temporary service interruption.