Key terms and concepts

This section contains key terms used in FortiAnalyzer-BigData.

Security Event Manager

The Security Event Manager is the cluster formed by multiple server blades or virtual machine instances that serves the web GUI and performs the workload for data processing, persistence, query, and management of security log events.

Security Event Manager Controller

The Security Event Manager Controller, or cluster controller, is a single host within the Security Event Manager that functions as the main controller for the hosts. This host is responsible for the DHCP, configuration management, and lifecycle management such as upgrades, resets, and more. If this host goes down, it can automatically failover to a standby host.

To find out which of the hosts is the active Controller host, go to the Host view in the Cluster Management GUI, where the active Controller will be highlighted.

Alternatively, you can run the following CLI command:

fazbdctl show members

The controller appears in the Role column

Blade

This refers to the physical blade server enclosed within the FortiAnalyzer-BigData chassis.

The Chassis Management Module

The Chassis Management Module (CMM) is used to remotely manage and monitor server hosts, power supplies, cooling fans, and networking switches for the FortiAnalyzer-BigData unit. The CMM comes with a web management utility that consolidates and simplifies system management for the FortiAnalyzer-BigData chassis.

The web management utility aggregates and displays data from the CMM and provides the following key management features:

Enables administrators to view in-depth hardware-level status information using a single interface.
Provides an OS-independent, remote graphical console.
Allows remote users to power control all or each of the blades.

Columnar Data Store

Unlike the traditional FortiAnalyzer data storage, FortiAnalyzer-BigData relies on the Kudu storage engine, which allows to store data in a columnar fashion.

Tables are split into contiguous segments called tablets, which represent a generic logical unit ready for further replication and parallelization. The replication factor is "3", which means three copies are stored in the system: one original copy and two replicated ones. The replicas are guaranteed to spread across different nodes for fault tolerance.

A tablet with N replicas (usually three or five) can continue to accept writes with up to (N – 1) / 2.

Kudu uses the Raft consensus algorithm for the election of masters and leader tablets, as well as determining the success or failure of a given write operation, which enforces the data integrity across replicas.

Row layout:

Columnar layout:

This allows aggressive compression and possibility of querying only necessary columns.

Kudu data store also makes stored log data mutable, which means that stored log events can be changed later.

Controller

This refers to the Security Event Manager Controller.

Data Flow

The following diagram shows the logging write and read path inside the platform:

Write Path:

The write path consists of the following steps:

Logs generated from logging devices arrive at the Main Host where the fortilogd process stores them temporarily on a local storage that serves as a buffer before distributing them across all hosts.
Logs are packed into a memory-efferent binary format and then streamed by the SQLlogd daemon to the "Ingestion Services", which are Kubernetes Pods processes running on each of Security Manager hosts and acting as an interface accepting the log data and forwarding it to the Distributed Stateful Workload engines.
First Distributed Stateful engine receiving the log data from Ingestion Services is Kafka processes, acting as BD buffering platform for the logs.
Spark distributed engine then pulls the logs from Kafka buffers in parallel streams and processes them in fault-tolerant micro-batches. These micro-batches are streamed to Kudu acting as a distribute storage engine. Kudu processes store the logs in a columnar data store, where they can be easily retrieved.

Read Path:

The read path consist of the following steps:

An admin tries to access the Logs via FortiView or Log View.
The logs are queried via REST API and Connector Services Pods that consist of Kubernetes Pods processes providing the interface between Main Host and the Security Event Manager hosts.
The REST API calls are translated to SQL queries and forwarded to Impala acting as a Distributed SQL Engine.
Impala coordinates and distributes the queries across Kudu processes, allowing so called Massively Parallel Processing (MPP).
The logs pulled from Kudu are then forwarded to FortiAnalyzer web services and displayed in GUI.

Data Management

The concept of "Archive logs" and "Analytics logs" is not valid for FortiAnalyzer-BigData. All logs are load-balanced across all hosts, where data is compressed, replicated, and available for immediate analytics.

Logs are stored in CFile format with a size of approximately 300 bytes post replication (x3) and compression.

Host

This refers to one of the server hosts in the FortiAnalyzer-BigData system.

Instances

Also known as Service instances. This refers to the instance serving the service. There are usually multiple instances running behind a service load balance.

Main host

The FortiAnalyzer-BigData main host is responsible for collecting logs and providing the services for FortiView, Log View, Reports, and more.

For FortiAnalyzer-BigData units, the main host runs on Blade A1.

Roles

The FortiAnalyzer-BigData hosts are categorized into different roles according to the kind of stateful services running on them. The roles are assigned automatically during the cluster initialization. The placement of those stateful services on each role is designed to achieve optimized performance, high data and service availability and scalability, and is immutable after the cluster is initialized. In a scaling-out scenario (see Scaling FortiAnalyzer-BigData), additional hosts can be added as Data nodes to the existing cluster. For FortiAnalyzer-BigData units, the additional hosts can be added on the extender chassis to the existing cluster in the main chassis.

FortiAnalyzer-BigData has the following roles and services:

Main Node (this role exists for FortiAnalyzer-BigData-VM only)
- Log Collector
- Main Services

Master Node
- Consul
- Contoller Service
- HDFS Datanode
- HDFS Journalnode
- Impala
- Kafka Broker
- Kudu Master
- Kudu Tablet Server
- Zookeeper
MetaStore Node
- HDFS Datanode
- HDFS Namenode
- Hive Metastore
- Impala
- Impala Catolog
- Impala Statestore
- Kafka Broker
- Kudu Tablet Server
Data Node
- HDFS Datanode
- Impala
- Kafka Broker
- Kudu Tablet Server

Services

This refers to the Security Event Manager services that are responsible for security data management, security data processing, storage, cluster management, and more.

Storage Pool

A Storage Pool is a set of one or more ADOMs. Storage pools provide fine-grained control over the data retention policy and improves the query and ingestion performance. Each storage pool can have its own data retention policy that controls the maximum age (in days) and disk utilization of the data. ADOMs within the same storage pool share the storage pool resource.

We recommend grouping ADOMs with similar log rates and data retention requirements into a storage pool. For example, group small ADOMs (in terms of log rate and data volume) into one storage pool and larger ADOMs in another. If different sized ADOMs are grouped into one storage pool, the query performance on the smaller ADOMs will be affected by the larger ADOMS.