Current Thresholds for Health Status

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute	Applicability	Threshold
CPU Utilization	All nodes	Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU and the loadAverage of last 15 minutes are less than 1 * number_of_cores Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU) and the loadAverage of the last 15 minutes are less than 2 * number_of_cores if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2nodes.hardware.vCPU and the loadAverage of the last 15 minutes is greater than 2 number_of_cores
Memory Utilization	All nodes	Normal - if memoryUsage.used_pct less than 75 Warning - if memoryUsage.used_pct between 75 and 90 Critical - if memoryUsage.used_pct greater than 90
Swap Space Utilization	All nodes	Normal - if swapUsage.in_bps less than 7Mbps Warning - if swapUsage.in_bps between 7Mbps and 10Mbps Critical - if swapUsage.in_bps greater than 10Mbps
Disk Utilization	All nodes; skips /data, data-clickhouse	Normal - if diskUsage.used_pct less than 65 Warning - if diskUsage.used_pct between 65 and 85 Critical - if diskUsage.used_pct greater than 85
I/O Utilization	All nodes	Normal - if cpuUsage.ioWait_pct less than 15 AND diskIO.readWait_ms less than 30 AND diskIO.writeWait_ms less than 30 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50 Warning - if cpuUsage.ioWait_pct between (15 and 30) OR diskIO.readWait_ms between (30 and 60) OR diskIO.writeWait_ms between (30 and 60) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75 Critical - if cpuUsage.ioWait_pct more than 30 OR diskIO.readWait_ms more than 60 OR diskIO.writeWait_ms more than 60 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75
Process Health	All nodes	Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50 Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75) Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75
Event Pipeline	Collector only	This indicates whether queues are building up in Collectors. Normal - if eventUploadQueue.total_mb less than 20 Warning - if eventUploadQueue.total_mb between 20 and 50 Critical - if eventUploadQueue.total_mb greater than 50
Event Pipeline	Worker only	This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage. Normal - if eventUploadQueue.disk_mb less than 25 Warning - if eventUploadQueue.disk_mb between 25 and 75 Critical - if eventUploadQueue.disk_mb greater than 75
Shared Store	Worker, Supervisor	This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost. Normal - if difference between reader and writer's sharedStore_pct is less than 15 Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30 Critical - if difference between reader and writer's sharedStore_pct is more than 30
Last Status Updated	All nodes	This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager. Normal - if nodes.metrics.lastStatusUpdated less than 15 minute delay Warning - if nodes.metrics.lastStatusUpdated between (15 minute and 20 minute) delay Critical - if nodes.metrics.lastStatusUpdated more than 20 minute delay
Last Event Time	Collector	This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers. Normal - if nodes.metrics.lastEventTime less than 15 minute delay Warning - if nodes.metrics.lastEventTime between (15 minute and 20 minute) delay Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 20 minute delay
Last File Received	Collector	This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers. Normal - if nodes.metrics.lastFileReceived less than 15 minute delay Warning - if nodes.metrics.lastFileReceived between (15 minute and 20 minute) delay Critical - if nodes.metrics.lastFileReceived more than 20 minute delay

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute

Applicability

Threshold

CPU Utilization

All nodes

Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU and the loadAverage of last 15 minutes are less than 1 * number_of_cores
Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU) and the loadAverage of the last 15 minutes are less than 2 * number_of_cores
if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2*nodes.hardware.vCPU and the loadAverage of the last 15 minutes is greater than 2 * number_of_cores

Memory Utilization

All nodes

Normal - if memoryUsage.used_pct less than 75
Warning - if memoryUsage.used_pct between 75 and 90
Critical - if memoryUsage.used_pct greater than 90

Swap Space Utilization

All nodes

Normal - if swapUsage.in_bps less than 7Mbps
Warning - if swapUsage.in_bps between 7Mbps and 10Mbps
Critical - if swapUsage.in_bps greater than 10Mbps

Disk Utilization

All nodes; skips /data, data-clickhouse

Normal - if diskUsage.used_pct less than 65
Warning - if diskUsage.used_pct between 65 and 85
Critical - if diskUsage.used_pct greater than 85

I/O Utilization

All nodes

Normal - if cpuUsage.ioWait_pct less than 15 AND diskIO.readWait_ms less than 30 AND diskIO.writeWait_ms less than 30 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50
Warning - if cpuUsage.ioWait_pct between (15 and 30) OR diskIO.readWait_ms between (30 and 60) OR diskIO.writeWait_ms between (30 and 60) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75
Critical - if cpuUsage.ioWait_pct more than 30 OR diskIO.readWait_ms more than 60 OR diskIO.writeWait_ms more than 60 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75

Process Health

All nodes

Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50
Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75)
Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75

Event Pipeline

Collector only

This indicates whether queues are building up in Collectors.

Normal - if eventUploadQueue.total_mb less than 20
Warning - if eventUploadQueue.total_mb between 20 and 50
Critical - if eventUploadQueue.total_mb greater than 50

Event Pipeline

Worker only

This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage.

Normal - if eventUploadQueue.disk_mb less than 25
Warning - if eventUploadQueue.disk_mb between 25 and 75
Critical - if eventUploadQueue.disk_mb greater than 75

Shared Store

Worker, Supervisor

This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost.

Normal - if difference between reader and writer's sharedStore_pct is less than 15
Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30
Critical - if difference between reader and writer's sharedStore_pct is more than 30

Last Status Updated

All nodes

This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager.

Normal - if nodes.metrics.lastStatusUpdated less than 15 minute delay
Warning - if nodes.metrics.lastStatusUpdated between (15 minute and 20 minute) delay
Critical - if nodes.metrics.lastStatusUpdated more than 20 minute delay

Last Event Time

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

Normal - if nodes.metrics.lastEventTime less than 15 minute delay
Warning - if nodes.metrics.lastEventTime between (15 minute and 20 minute) delay
Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 20 minute delay

Last File Received

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

Normal - if nodes.metrics.lastFileReceived less than 15 minute delay
Warning - if nodes.metrics.lastFileReceived between (15 minute and 20 minute) delay
Critical - if nodes.metrics.lastFileReceived more than 20 minute delay

Integration API Guide

Current Thresholds for Health Status

Current Thresholds for Health Status

Current Thresholds for Health Status