ClickHouse Overview
ClickHouse Overview
ClickHouse is a high-performance column-oriented database that is optimized for querying and reporting. ClickHouse is integrated into FortiSIEM version 6.5.0 and later. Most back end ClickHouse functions are managed by FortiSIEM automatically, minimizing the overhead on the user. The FortiSIEM GUI includes a ClickHouse configuration page where the keeper nodes, shards and clusters can be configured. Additional background information on ClickHouse can be found at https://clickhouse.com/.
ClickHouse is integrated into the central cluster of Supervisor and Worker nodes. ClickHouse runs on these nodes alongside the FortiSIEM application processes. This architecture means there is no need to deploy a separate database solution to store event data - the FortiSIEM Supervisor and Worker nodes deliver event processing, query processing and event storage in a single solution.
Important ClickHouse concepts include shards, replicas, and the ClickHouse keeper process. A simplified explanation of each is provided below.
Shards are pieces of the database distributed across multiple nodes. More shards mean faster database performance in most scenarios, as database operations are performed on multiple nodes simultaneously. The number of shards in a FortiSIEM cluster varies from one for very small deployments, and up to over 15 for the largest deployments.
Replicas are copies of data stored on different nodes for resilience; many enterprise deployments will have two shards. Replicas are local to a shard. For example, a single shard with two replicas would have two ClickHouse data nodes, each with a separate copy of the database. A two shard, two replica database would have four ClickHouse data nodes: two replicated nodes in shard 1, and two replicated nodes in shard two:
ClickHouse Shard 1 |
ClickHouse Shard 2 |
||
Node 1 Shard 1 / Replica 1 |
Node 2 Shard 1 / Replica 2 |
Node 3 Shard 2 / Replica 1 |
Node 4 Shard 2 / Replica 2 |
The ClickHouse keeper process manages insertion of data into the ClickHouse database. It synchronizes data writes and replication across multiple shards and replicas in a way that maintains data integrity. All deployments contain a keeper process running on at least one FortiSIEM node. Larger deployments have dedicated keeper nodes.
Important ClickHouse processes/ node types that run on FortiSIEM nodes alongside the FortiSIEM application are:
ClickHouse Node Role |
ClickHouse Keeper |
Data Node |
Query Node |
Description |
Manages ClickHouse insertion and replication |
Inserts data into ClickHouse Database |
Reads data from ClickHouse Database |
Can run on |
Supervisor or Worker |
Supervisor or Worker |
Supervisor or Worker |
In many deployments, the same FortiSIEM node will be both a Data and Query node. Separate dedicated Keeper nodes are used in many deployments for the purposes of scalability and resilience, as described in the section ClickHouse Keeper Process Resilience.