High Availability and Disaster Recovery Procedures - ClickHouse

Home FortiSIEM 6.7.0 High Availability and Disaster Recovery Procedures - ClickHouse

6.7.0

High Availability Operations

Leader Node Fails
Follower Node Fails
Add a Failed Supervisor back to Cluster

Leader Node Fails

If the Leader node fails, for example, because of a hardware issue, then you need to take the following steps.

Step 1: Promote Follower1 as new Leader by following these steps.

SSH to Follower1 and run the following command.

phfollower2primary <ownIP>

After the script finishes, Follower1 will be the new Leader and the chain becomes: Leader (old Follower1) -> Follower2 -> Follower3
Login to GUI. Load Balancer will likely route to any Follower.
Navigate to ADMIN > License > Nodes.
Select the old Leader node and click Delete.
Click Yes to confirm.

Step 2: If Disaster Recovery is enabled, then change Leader for Secondary node.

Login to (new Leader) Follower1 GUI.
Navigate to ADMIN > License > Nodes.
Choose Secondary node, and click Edit.
Enter the (new Leader) Follower1 information in the Primary column.
1. Change the Host Name field to the Follower1 Host Name.
2. Change the IP Address field to the Follower1 IP Address.
3. DO NOT change the License UUID yet. Do this after you have done a new license with Follower1's UUID in Step 4.4.
4. Set SSH Parameters (SSH Public Key, SSH Private Key Path) to that of Follower1.
5. Click Save.

Step 3: Install a new license with new Leader (Follower1) UUID.

Since the license is tied to failed Leader’s UUID, you will repeatedly see a message prompting you to install a new license with new Leader’s UUID within a 2 weeks grace period from the time of failure. To resolve this, take the following steps.

Login to (new Leader) Follower1 GUI. If you go through Load Balancer, then you may end up in Follower2 GUI which does not allow this operation.
Navigate to ADMIN > License > General.
Click Upload and provide the license file with matching (new leader) Follower1’s UUID.

Note: You cannot add new Followers to the system during the 2 weeks grace period when the Primary Leader's UUID does not match the License.

Step 4: If Disaster Recovery is enabled, update Licensed Primary UUID for Secondary node.

Login to (new Leader) Follower1 GUI.
Navigate to ADMIN > License > Nodes.
Choose Secondary node and click Edit.
Update the Licensed UUID for the Primary node.

Follower Node Fails

If any follower node fails, take the following steps.

Step 1: Remove the failed Follower node from the Cluster.

Login to GUI and navigate to ADMIN > License > Nodes.
Select the node
Click Delete.

Add a Failed Supervisor back to Cluster

If you want to add a failed Supervisor back to the Cluster, then follow these steps.

Navigate to /opt/phoenix/deployment/jumpbox.
Clean the state data by running the following script, using its own IP.

phresetclusternode <myip>

Note: After completion, this script will reboot your appliance.
Add the node as a Follower by following the steps in Add Primary Follower.

Leader Node Fails
Follower Node Fails
Add a Failed Supervisor back to Cluster