Design for Resilience

When designing resilience into the solution consider:

Supervisor node availability and scaling
ClickHouse replicas
ClickHouse keeper node resilience
Architectural resilience provided by load balancers
Underlying hypervisor resilience features. Many hypervisors provide features to increase the resilience of their hosted VMs
Host resilience features, redundant PSU, NIC, fans, and storage array resilience

The key resilience points to review are:

Supervisor Resilience
ClickHouse Database Resilience
Additional Architectural Resilience

Supervisor Resilience

FortiSIEM supports high availability Supervisor nodes as a licensed feature. This capability provides:

Up to 5 Supervisor nodes
CMDB, Incident, SVN is synchronized between the Supervisor nodes
There is a concept of a Primary Leader Supervisor node and subsequent Supervisor nodes become Primary Follower Supervisor nodes.
If the Primary Leader Supervisor node is unavailable, then a Primary Follower Supervisor node can be promoted to take the Leader role.
Provide scale out of concurrent GUI users to the platform.

For more information on High Availability, see the documentation here: https://docs.fortinet.com/document/fortisiem/6.7.0/high-availability-and-disaster-recovery-procedures-clickhouse/933956/high-availability-and-disaster-recovery-clickhouse

ClickHouse Database Resilience

A ClickHouse replica is a copy of the data within a shard stored on another host in the shard. This provides resilience against the failure of a node within the shard - if a shard has two replicas and one of the hosts fails, the system will continue to use the remaining replica on the remaining host. Shards can have more than two replicas, which will increase the resilience but also the cost of the solution. For increased resilience, each replica should be hosted on a separate server and the data stored on a separate storage array. Hosting replicas on the same storage array leaves the solution more vulnerable to data loss due to a hardware failure.

The ClickHouse keeper process is essential to the functioning of the system. If the keeper process is not available, then the system will operate in read-only mode and new events will be dropped. Take the following steps to increase the resilience of the keeper process:

Deploy the keeper process on a separate dedicated worker node. This will reduce the possibility that the keeper node has to be rebooted as it is not co-hosted with the Supervisor or active Worker node processes.
Consider deploying three keeper nodes in a cluster for maximum resilience. In a three-keeper node cluster, one node can fail and the cluster will remain operational.
1. Note that a two-node cluster does not provide resilience due to the requirement for a quorum of nodes in a mulit-node cluster. See ClickHouse documentation for more information on the advanced concept.

Additional Architectural Resilience

FortiSIEM Supervisor and Worker nodes should be deployed on a high performance, resilient, data center class network. High performance LAN switches should be deployed in a hierarchical topology with resilient, high bandwidth uplinks and high-performance failure-recovery mechanisms. If cluster traffic traverses a wide-area network (WAN), this should be an enterprise class, resilient WAN that provides high bandwidth and low latency. FortiSIEM cluster traffic should be prioritized across the LAN and WAN to minimize latency between nodes.

Load balancers can be deployed at various points within the system both for scalability and increased responsiveness to a failure. This is discussed in more depth throughout the document; some examples include:

Load balancers can be installed in front of the Worker node cluster and collectors configured to upload to a shared virtual IP (VIP) on the load balancer that is balanced across a group of workers. This will make the failure of a worker less noticeable to the collectors.
- This is optional. By default, FortiSIEM has an inbuilt load-sharing mechanism which distributes collectors across the Worker cluster and fails over to another Worker node in the event of a Worker failure.
Load balancers can be installed in front of a group of Collectors to provide resilience for inbound syslog and FortiSIEM agent connections.

Many hypervisors include advanced features to increase the resilience and uptime of the VMs they host. Extensive hardware features are available to increase server and storage resilience. Be sure to work with the sever team to take full advantage of these when designing the solution, and to understand the limitations and potential points of failure present in the hosting solution as they may also affect the performance of the solution, and the availability and integrity of the data hosted on it.

Design for Resilience

When designing resilience into the solution consider:

Supervisor node availability and scaling
ClickHouse replicas
ClickHouse keeper node resilience
Architectural resilience provided by load balancers
Underlying hypervisor resilience features. Many hypervisors provide features to increase the resilience of their hosted VMs
Host resilience features, redundant PSU, NIC, fans, and storage array resilience

The key resilience points to review are:

Supervisor Resilience
ClickHouse Database Resilience
Additional Architectural Resilience

Supervisor Resilience

FortiSIEM supports high availability Supervisor nodes as a licensed feature. This capability provides:

Up to 5 Supervisor nodes
CMDB, Incident, SVN is synchronized between the Supervisor nodes
There is a concept of a Primary Leader Supervisor node and subsequent Supervisor nodes become Primary Follower Supervisor nodes.
If the Primary Leader Supervisor node is unavailable, then a Primary Follower Supervisor node can be promoted to take the Leader role.
Provide scale out of concurrent GUI users to the platform.

ClickHouse Database Resilience

Deploy the keeper process on a separate dedicated worker node. This will reduce the possibility that the keeper node has to be rebooted as it is not co-hosted with the Supervisor or active Worker node processes.
Consider deploying three keeper nodes in a cluster for maximum resilience. In a three-keeper node cluster, one node can fail and the cluster will remain operational.
1. Note that a two-node cluster does not provide resilience due to the requirement for a quorum of nodes in a mulit-node cluster. See ClickHouse documentation for more information on the advanced concept.

Additional Architectural Resilience

Load balancers can be installed in front of the Worker node cluster and collectors configured to upload to a shared virtual IP (VIP) on the load balancer that is balanced across a group of workers. This will make the failure of a worker less noticeable to the collectors.
- This is optional. By default, FortiSIEM has an inbuilt load-sharing mechanism which distributes collectors across the Worker cluster and fails over to another Worker node in the event of a Worker failure.
Load balancers can be installed in front of a group of Collectors to provide resilience for inbound syslog and FortiSIEM agent connections.

Secure Networking

Hybrid Mesh Firewall

NOC Management

LAN

WAN

Unified SASE

Single Vendor SASE

Cloud Network Security

Secure Endpoint Connectivity

Web Application / API Protection

Security Operations

Security Operations Automation

Identity

Early Detection & Prevention

Secure Networking

Hybrid Mesh Firewall

NOC Management

LAN

WAN

Communication & Surveillance

Unified SASE

Single Vendor SASE

Secure Endpoint Connectivity

Cloud Network Security

Cloud-Native Security

Web Application / API Protection

Security Operations

Security Operations Automation

Endpoint

Data Protection

Identity

Email

Early Detection & Prevention

Expert Services

Edge Firewall

Orchestration & management

SD Branch

Application Delivery

Single Vendor SASE

Secure Endpoint Connectivity

Secure Private Access

Thin Edge

Identity

Application Gateway

Enterprise Asset Management

Endpoint Agent

Agentless Security Posture

Identity

Wireless

Switching

Identity

Privilege Acccess Management

Next Generation Firewall

Orchestration & management

Expert Services

All

All

Account Management

SAAS Management

SAAS Application Security

Managed Services

Platform as a service (PAAS)

Other SAAS Services

4D Pillars

Cloud

Popular Solutions

FortiSIEM Reference Architecture Using ClickHouse