ClickHouse Keeper Process Resilience

The ClickHouse keeper process is critical. It manages data insertion and replication within the ClickHouse cluster. If the keeper process is not functional, then data cannot be written into the database, event ingestion will stop, and the system will be in a read-only state.

In small deployments, the keeper process runs on the Supervisor node, sharing resources with other FortiSIEM processes. In mid-size and large deployments, the keeper process should run on separate node(s) to ensure it isn’t starved of resources by other processes, which can lead to the process becoming unavailable and the system entering read-only mode. Refer to the FortiSIEM Sizing Guide for the appropriate configuration for your events per second (EPS). While mid-size deployments can run with a single keeper node, additional system resilience is achieved by deploying a three-node cluster.

The keeper node cluster uses the concept of ‘quorum’ to maintain integrity. If a node fails in a multi-node keeper cluster, the majority of nodes must remain for the system to be sure of its integrity and continue writing to the database. If quorum is lost, then the system enters read-only mode to maintain integrity, and events will not be written to the database. This means keeper clusters should contain either one keeper node for a cost-effective solution with no keeper process resilience, or three keeper nodes for resilience so that if one node fails, the remaining two nodes can maintain quorum.

Other keeper cluster architectures are not recommended:

Deploying two keeper nodes will not provide automatic failover resilience. If one of the two keeper nodes fails in a two-node cluster, then the cluster does not maintain a majority. It loses quorum and database writes cease until the remaining keeper node is manually recovered with a CLI command.
Deploying more than three keeper nodes can reduce overall database insert performance due to the increased overhead from the larger keeper cluster.

Three dedicated keeper nodes is the optimal configuration for most deployment scenarios that require keeper process resilience.

ClickHouse Keeper Process Resilience

Other keeper cluster architectures are not recommended:

Deploying two keeper nodes will not provide automatic failover resilience. If one of the two keeper nodes fails in a two-node cluster, then the cluster does not maintain a majority. It loses quorum and database writes cease until the remaining keeper node is manually recovered with a CLI command.

Deploying more than three keeper nodes can reduce overall database insert performance due to the increased overhead from the larger keeper cluster.

Three dedicated keeper nodes is the optimal configuration for most deployment scenarios that require keeper process resilience.

FortiSIEM Reference Architecture Using ClickHouse

ClickHouse Keeper Process Resilience

ClickHouse Keeper Process Resilience

ClickHouse Keeper Process Resilience