Fortinet black logo

Current Thresholds for Health Status

Current Thresholds for Health Status

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute

Applicability

Threshold

CPU Utilization

All nodes

  • Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU and the loadAverage of last 15 minutes are less than 1 * number_of_cores

  • Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU) and the loadAverage of the last 15 minutes are less than 2 * number_of_cores

  • if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2*nodes.hardware.vCPU and the loadAverage of the last 15 minutes is greater than 2 * number_of_cores

Memory Utilization

All nodes

  • Normal - if memoryUsage.used_pct less than 75

  • Warning - if memoryUsage.used_pct between 75 and 90

  • Critical - if memoryUsage.used_pct greater than 90

Swap Space Utilization

All nodes

  • Normal - if swapUsage.in_bps less than 7Mbps

  • Warning - if swapUsage.in_bps between 7Mbps and 10Mbps

  • Critical - if swapUsage.in_bps greater than 10Mbps

Disk Utilization

All nodes; skips /data, data-clickhouse

  • Normal - if diskUsage.used_pct less than 65

  • Warning - if diskUsage.used_pct between 65 and 85

  • Critical - if diskUsage.used_pct greater than 85

I/O Utilization

All nodes

  • Normal - if cpuUsage.ioWait_pct less than 15 AND diskIO.readWait_ms less than 30 AND diskIO.writeWait_ms less than 30 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50

  • Warning - if cpuUsage.ioWait_pct between (15 and 30) OR diskIO.readWait_ms between (30 and 60) OR diskIO.writeWait_ms between (30 and 60) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75

  • Critical - if cpuUsage.ioWait_pct more than 30 OR diskIO.readWait_ms more than 60 OR diskIO.writeWait_ms more than 60 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75

Process Health

All nodes

  • Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50

  • Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75)

  • Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75

Event Pipeline

Collector only

This indicates whether queues are building up in Collectors.

  • Normal - if eventUploadQueue.total_mb less than 20

  • Warning - if eventUploadQueue.total_mb between 20 and 50

  • Critical - if eventUploadQueue.total_mb greater than 50

Event Pipeline

Worker only

This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage.

  • Normal - if eventUploadQueue.disk_mb less than 25

  • Warning - if eventUploadQueue.disk_mb between 25 and 75

  • Critical - if eventUploadQueue.disk_mb greater than 75

Shared Store

Worker, Supervisor

This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost.

  • Normal - if difference between reader and writer's sharedStore_pct is less than 15

  • Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30

  • Critical - if difference between reader and writer's sharedStore_pct is more than 30

Last Status Updated

All nodes

This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager.

  • Normal - if nodes.metrics.lastStatusUpdated less than 15 minute delay

  • Warning - if nodes.metrics.lastStatusUpdated between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.lastStatusUpdated more than 20 minute delay

Last Event Time

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastEventTime less than 15 minute delay

  • Warning - if nodes.metrics.lastEventTime between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 20 minute delay

Last File Received

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastFileReceived less than 15 minute delay

  • Warning - if nodes.metrics.lastFileReceived between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.lastFileReceived more than 20 minute delay

Current Thresholds for Health Status

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute

Applicability

Threshold

CPU Utilization

All nodes

  • Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU and the loadAverage of last 15 minutes are less than 1 * number_of_cores

  • Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU) and the loadAverage of the last 15 minutes are less than 2 * number_of_cores

  • if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2*nodes.hardware.vCPU and the loadAverage of the last 15 minutes is greater than 2 * number_of_cores

Memory Utilization

All nodes

  • Normal - if memoryUsage.used_pct less than 75

  • Warning - if memoryUsage.used_pct between 75 and 90

  • Critical - if memoryUsage.used_pct greater than 90

Swap Space Utilization

All nodes

  • Normal - if swapUsage.in_bps less than 7Mbps

  • Warning - if swapUsage.in_bps between 7Mbps and 10Mbps

  • Critical - if swapUsage.in_bps greater than 10Mbps

Disk Utilization

All nodes; skips /data, data-clickhouse

  • Normal - if diskUsage.used_pct less than 65

  • Warning - if diskUsage.used_pct between 65 and 85

  • Critical - if diskUsage.used_pct greater than 85

I/O Utilization

All nodes

  • Normal - if cpuUsage.ioWait_pct less than 15 AND diskIO.readWait_ms less than 30 AND diskIO.writeWait_ms less than 30 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50

  • Warning - if cpuUsage.ioWait_pct between (15 and 30) OR diskIO.readWait_ms between (30 and 60) OR diskIO.writeWait_ms between (30 and 60) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75

  • Critical - if cpuUsage.ioWait_pct more than 30 OR diskIO.readWait_ms more than 60 OR diskIO.writeWait_ms more than 60 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75

Process Health

All nodes

  • Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50

  • Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75)

  • Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75

Event Pipeline

Collector only

This indicates whether queues are building up in Collectors.

  • Normal - if eventUploadQueue.total_mb less than 20

  • Warning - if eventUploadQueue.total_mb between 20 and 50

  • Critical - if eventUploadQueue.total_mb greater than 50

Event Pipeline

Worker only

This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage.

  • Normal - if eventUploadQueue.disk_mb less than 25

  • Warning - if eventUploadQueue.disk_mb between 25 and 75

  • Critical - if eventUploadQueue.disk_mb greater than 75

Shared Store

Worker, Supervisor

This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost.

  • Normal - if difference between reader and writer's sharedStore_pct is less than 15

  • Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30

  • Critical - if difference between reader and writer's sharedStore_pct is more than 30

Last Status Updated

All nodes

This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager.

  • Normal - if nodes.metrics.lastStatusUpdated less than 15 minute delay

  • Warning - if nodes.metrics.lastStatusUpdated between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.lastStatusUpdated more than 20 minute delay

Last Event Time

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastEventTime less than 15 minute delay

  • Warning - if nodes.metrics.lastEventTime between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 20 minute delay

Last File Received

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastFileReceived less than 15 minute delay

  • Warning - if nodes.metrics.lastFileReceived between (15 minute and 20 minute) delay

  • Critical - if nodes.metrics.lastFileReceived more than 20 minute delay