Configuring ClickHouse Based Deployments
This section covers the following topics.
- ClickHouse Configuration Overview
- Configuring ClickHouse Storage on Supervisor Node
- Configuring ClickHouse Topology
- Creating ClickHouse Archive Storage
ClickHouse Configuration Overview
It may be helpful to review the concepts in ClickHouse Operational Overview and the ClickHouse Sizing Guide. First you need to design your ClickHouse Online Cluster and the role of supervisor and worker nodes. There are 3 cases:
- Small deployments: All-in-one deployment using Supervisor Virtual Machine or a hardware appliance like FortiSIEM 2000G or 3500G.
- Medium sized deployments: Supervisor is a member of Keeper Cluster but not the Data Cluster. Workers are members of both Keeper and Data Clusters.
- Large deployments: Supervisor is not a part of Keeper or Data Clusters. Workers entirely form the Keeper and Data Clusters.
The configuration steps involve:
- Creating storage on Supervisor and Worker nodes depending on their role.
- Creating a ClickHouse topology to specify the Supervisor and Worker nodes belonging to Keeper cluster and Data cluster.
Next, you need to configure the Archive, where events will be stored after the Online data stores become full. Here are the following options:
- For on-premises deployments, you can use a large Warm disk tier as Archive; or real-time archive to NFS.
- For AWS Cloud deployments, you can use AWS S3 for Archive.
- For GCP deployments you can use GCS for Archive.
After configuring the online and archive storage, you need to specify the retention policies. See How ClickHouse Event Retention Works for details.
Information on Online event database usage can be seen at Viewing Online Event Data Usage.
Information on Archive event database usage can be seen at Viewing Archive Data.
For Advanced Configuration Operations, see Advanced Operations in the Appendix.
Configuring ClickHouse Storage on Supervisor Node
Follow these steps:
Note: Make sure license has been uploaded. This can be done by navigating to ADMIN > License and clicking Upload to load license. For more information, refer to FortiSIEM Licensing Guide.
- Navigate to ADMIN > Setup > Storage and click Online to choose storage.
- From the Event Database drop-down list, select ClickHouse.
- Set Storage Tiers and disks following the guidelines below.
- If your hardware model is 3600G or 2200G or 2000G:
- Storage Tiers count is set to 2 (Hot Tier and Warm Tier).
- The Hot and Warm Tier disks are pre-configured.
- If you want to add to add more storage, then you can add storage to 3rd Tier. Set Storage Tiers to 3 and add NFS mounted storage in Cold Tier.
- If your hardware model is 3500G or 2000F or 500G:
- Storage Tiers count is set to 1 (Hot Tier).
- Hot Tier disks are pre-configured.
- If you want to add to add more storage, then you can add storage to 2nd and 3rd Tiers. Set Storage Tiers to 2 or 3 and add NFS mounted storage in Warm Tier or Cold Tier as needed.
- If you are running on VM, then you can set Storage Tiers to 1, 2 or 3 and add disks to each Tier. Note that NFS mounted disks cannot be added to Hot Tier for performance reasons.
- To add a disk at any tier, click + and enter the Disk Path.
- To specify a locally attached Disk Path, run
lsblkcommand to find Disk Path, which should be of the form ‘/dev/<disk>’ - To specify a NFS mounted Disk Path, enter in the following format:
<NFS Server IP or HostName>:<exported mount point>.
Example:192.0.20.0:/mnt/warm
For more information, see steps 1 and 2 in the NFS Storage Guide.
Notes:
- The mount point should be different for each Worker node and each tier, else the data is going to be overwritten.
- You cannot mount NFS storage for Hot Tier.
If you are planning on using Worker nodes for add storage, see Adding a Worker Node for details.
- Click Test and if successful then Deploy.
Configuring ClickHouse Topology
After configuring storage, you need to set up the ClickHouse topology. This involves:
- Selecting the Supervisor or Worker nodes that belong to the ClickHouse Keeper Cluster.
- Choosing the number of shards for the ClickHouse Data cluster.
- Selecting the Worker nodes that belong to the ClickHouse Data cluster.
See ClickHouse Configuration for details.
Creating ClickHouse Archive Storage
Here are the following options:
- For on-premises deployments, you can use a large Cold disk tier as Archive, or you can use real-time archive to NFS.
- For AWS Cloud deployments, you can use AWS S3 for Archive.
- For GCS deployments, you can use GCS for Archive.
Case 1: If you want ClickHouse Cold tier as archive, then configure Cold storage tier in each of the nodes in the ClickHouse Data Cluster. See Adding a Worker Node for details.
Case 2: To configure real-time archive using NFS, follow these steps:
- Go to ADMIN > Setup > Storage.
- Click Archive, and select NFS.
- Enter the following parameters:
- IP/Host: [Required] Select IP or Host and enter the IP address/Host name of the NFS server.
- Exported Directory: [Required] Enter the file path on the NFS Server which will be mounted.
- Click Test.
- If the test succeeds, click Deploy.
Case 3: To configure AWS S3 for Archive, follow these steps:
- Go to ADMIN > Setup > Storage.
- Click Archive, and select AWS S3.
- For Credential Type, select Environmental Credentials or Explicit Credentials.
- If Environmental Credentials is selected, you will need to have an Identity and Access Management. Follow the instructions in Creating IAM Policy for AWS S3 Explicit Credentials to create an IAM Policy
- If Explicit Credentials is selected, then enter the following information:
- Access Key ID: Access Key ID required to access the S3 bucket(s)
- Secret Access Key: The Secret Access Key associated with the Access Key ID to access the S3 bucket(s)
- For Buckets:
- In the Bucket field, enter the bucket URL.
- In the Region field, enter the region. For example, "us-east-1".
Note: To minimize any latency, enter the closest region. - If more Buckets are required, click + to add a new row.
- Click Test.
- If the test succeeds, click Deploy.
- Configure each ClickHouse Worker to use the configured S3 bucket.
- Navigate to Admin > License > Nodes, edit each Worker, check AWS S3 and choose the Bucket from the drop-down.
- Click Test, and if the test succeeds, click Deploy.
- If the Supervisor is used as ClickHouse node, take the following steps:
- Navigate to Admin > Setup > Storage, click Online, check AWS S3 and choose the Bucket from the drop-down.
- Click Test, and if the test succeeds, click Deploy.
- Apply AWS S3 as the new storage policy to the ClickHouse cluster by taking the following steps.
- Navigate to Admin > Settings > Database > ClickHouse Config.
- Add the AWS S3 bucket(s) to your ClickHouse Cluster Configuration using the appropriate Shard # > Replica # drop-down list.
- Click Test, and if the test succeeds, click Deploy.
Implementation Notes:
- AWS S3 buckets MUST be created prior to this configuration.
- When storing ClickHouse data in AWS S3, Fortinet recommends turning Bucket Versioning off, or suspending it (if it was previously enabled). This is because data in ClickHouse files may change and versioning will keep both copies of data - new and old. With time, the number of stale objects may increase, resulting in higher AWS S3 costs. If versioning was previously enabled for the bucket, Fortinet recommends suspending it and configuring a policy to delete non-current versions.
- Archive data will NOT be automatically purged by FortiSIEM or ClickHouse.
- S3 archive folder will not be generated until the worker performs its first archive into S3.
Case 4: To configure GCS for Archive, follow these steps:
- Go to ADMIN > Setup > Storage.
- Click Archive, and select GCS.
- Enter the following information:
- Access Key ID: Access Key ID required to access the GCS bucket(s)
- Secret Access Key: The Secret Access Key associated with the Access Key ID to access the GCS bucket(s)
Note: See Google IAM documentation here for more information about keys.
- For Buckets:
- In the Bucket field, enter the bucket.
- If more Buckets are required, click + to add a new row.
Note: See Google Cloud Storage documentation here for more information about buckets.
- Click Test.
- If the test succeeds, click Deploy.
- Configure each ClickHouse Worker to use the configured GCS bucket.
- Navigate to Admin > License > Nodes, edit each Worker, check Archive GCS and choose the Bucket from the GCS Bucket drop-down.
- Click Test, and if the test succeeds, click Deploy.
- If the Supervisor is used as ClickHouse node, take the following steps:
- Navigate to Admin > Setup > Storage, click Online, check Archive GCS and choose the Bucket from the GCS Bucket drop-down.
- Click Test, and if the test succeeds, click Deploy.
- Apply GCS as the new storage policy to the ClickHouse cluster by taking the following steps.
- Navigate to Admin > Settings > Database > ClickHouse Config.
- Add the GCS bucket(s) to your ClickHouse Cluster Configuration by using the appropriate Shard # > Replica # drop-down list.
- Click Test, and if the test succeeds, click Deploy.
Implementation Notes:
- GCS buckets MUST be created prior to this configuration.
- When storing ClickHouse data in GCS, Fortinet recommends turning Bucket Versioning off, or suspending it (if it was previously enabled). This is because data in ClickHouse files may change and versioning will keep both copies of data - new and old. With time, the number of stale objects may increase, resulting in higher GCS costs. If versioning was previously enabled for the bucket, Fortinet recommends suspending it and configuring a policy to delete non-current versions.
- Archive data will NOT be automatically purged by FortiSIEM or ClickHouse.
Creating IAM Policy for AWS S3 Explicit Credentials
Take the following steps from your AWS console.
- From your EC2 Dashboard, select your instance.
- Navigate to the IAM dashboard.
Note: You can go there by clicking the IAM button, or by clicking on Services and selecting IAM. - Click Policies to navigate to the Policies page, and click Create policy.
- From the Create policy page, click the JSON tab.
- Paste the following JSON code into the editor to configure your policy.
{ "Version":"2012-10-17", "Statement":[ { "Sid":"VisualEditor0", "Effect":"Allow", "Action":[ "s3:ListStorageLensConfigurations", "s3:ListAccessPointsForObjectLambda", "s3:GetAccessPoint", "s3:PutAccountPublicAccessBlock", "s3:GetAccountPublicAccessBlock", "s3:ListAllMyBuckets", "s3:ListAccessPoints", "s3:PutAccessPointPublicAccessBlock", "s3:ListJobs", "s3:PutStorageLensConfiguration", "s3:ListMultiRegionAccessPoints", "s3:CreateJob" ], "Resource":"*" }, { "Sid":"VisualEditor1", "Effect":"Allow", "Action":"s3:*", "Resource":[ "arn:aws:s3:::demo-bucket", "arn:aws:s3:::demo-bucket/*" ] } ] } - Click the Next: Tags button.
Note: Tags does not need to be configured. - Click the Next: Review button.
- On the Create policy page, in the Name field, enter a name for the policy.
- Click the Create policy button. Your policy has been created.
- Navigate back to the IAM dashboard and click Roles, and click Create role.
- For Select trusted entity, select AWS service.
- Under Use case, select EC2.
- Click Next, and then click Next again.
- On the Name, review, and create page, in the Role name field, enter a name for the role.
- Under Step 2: Add permissions, click the Edit button, and select the policy you created earlier, and click Next.
- Click Create role.
- Navigate to the Instances page, select your instance and click the Security tab.
- Click Actions (located upper left), and select Security > Change security groups > Modify IAM role.
- Select the role you just created, and click Update IAM role.