Version:

Version:

Version:

Version:

Version:

Version:

Version:

Version:

Version:


Table of Contents

Integration API Guide

Current Thresholds for Health Status

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute

Applicability

Threshold

CPU Utilization

All nodes

  • Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU

  • Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU)

  • Critical - if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2*nodes.hardware.vCPU

Memory Utilization

All nodes

  • Normal - if memoryUsage.used_pct less than 75

  • Warning - if memoryUsage.used_pct between 75 and 90

  • Critical - if memoryUsage.used_pct greater than 90

Swap Space Utilization

All nodes

  • Normal - if swapUsage.in_bps less than 750,000

  • Warning - if swapUsage.in_bps between 750,000 and 1,000,000

  • Critical - if swapUsage.in_bps greater than 1,000,000

Disk Utilization

All nodes; skips /data, data-clickhouse

  • Normal - if diskUsage.used_pct less than 65

  • Warning - if diskUsage.used_pct between 65 and 85

  • Critical - if diskUsage.used_pct greater than 85

I/O Utilization

All nodes

  • Normal - if cpuUsage.ioWait_pct less than 1 AND diskIO.readWait_ms less than 15 AND diskIO.writeWait_ms less than 15 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50

  • Warning - if cpuUsage.ioWait_pct between (1 and 5) OR diskIO.readWait_ms between (15 and 25) OR diskIO.writeWait_ms between (15 and 25) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75

  • Critical - if cpuUsage.ioWait_pct more than 5 OR diskIO.readWait_ms more than 25 OR diskIO.writeWait_ms more than 25 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75

Process Health

All nodes

  • Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50

  • Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75)

  • Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75

Event Pipeline

Collector only

This indicates whether queues are building up in Collectors.

  • Normal - if eventUploadQueue.total_mb less than 20

  • Warning - if eventUploadQueue.total_mb between 20 and 50

  • Critical - if eventUploadQueue.total_mb greater than 50

Event Pipeline

Worker only

This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage.

  • Normal - if eventUploadQueue.disk_mb less than 25

  • Warning - if eventUploadQueue.disk_mb between 25 and 75

  • Critical - if eventUploadQueue.disk_mb greater than 75

Shared Store

Worker, Supervisor

This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost.

  • Normal - if difference between reader and writer's sharedStore_pct is less than 15

  • Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30

  • Critical - if difference between reader and writer's sharedStore_pct is more than 30

Last Status Updated

All nodes

This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager.

  • Normal - if nodes.metrics.lastStatusUpdated less than 5 minute delay

  • Warning - if nodes.metrics.lastStatusUpdated between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.lastStatusUpdated more than 10 minute delay

Last Event Time

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastEventTime less than 5 minute delay

  • Warning - if nodes.metrics.lastEventTime between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 10 minute delay

Last File Received

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastFileReceived less than 5 minute delay

  • Warning - if nodes.metrics.lastFileReceived between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.lastFileReceived more than 10 minute delay

 

Current Thresholds for Health Status

The following table provides information on what normal thresholds are for certain Health JSON attributes.

Health JSON Attribute

Applicability

Threshold

CPU Utilization

All nodes

  • Normal - if cpuUsage.used_pct less than 75 AND loadAverage 15 minutes less than nodes.hardware.vCPU

  • Warning - if (cpuUsage.used_pct between (75 and 90) OR (loadAverage 15 minutes between (nodes.hardware.vCPU and 2* nodes.hardware.vCPU)

  • Critical - if cpuUsage.used_pct greater than 90 OR loadAverage 15 minutes greater than 2*nodes.hardware.vCPU

Memory Utilization

All nodes

  • Normal - if memoryUsage.used_pct less than 75

  • Warning - if memoryUsage.used_pct between 75 and 90

  • Critical - if memoryUsage.used_pct greater than 90

Swap Space Utilization

All nodes

  • Normal - if swapUsage.in_bps less than 750,000

  • Warning - if swapUsage.in_bps between 750,000 and 1,000,000

  • Critical - if swapUsage.in_bps greater than 1,000,000

Disk Utilization

All nodes; skips /data, data-clickhouse

  • Normal - if diskUsage.used_pct less than 65

  • Warning - if diskUsage.used_pct between 65 and 85

  • Critical - if diskUsage.used_pct greater than 85

I/O Utilization

All nodes

  • Normal - if cpuUsage.ioWait_pct less than 1 AND diskIO.readWait_ms less than 15 AND diskIO.writeWait_ms less than 15 AND nfsIO.readLatency_ms less than 50 AND nfsIO.writeLatency_ms less than 50

  • Warning - if cpuUsage.ioWait_pct between (1 and 5) OR diskIO.readWait_ms between (15 and 25) OR diskIO.writeWait_ms between (15 and 25) AND nfsIO.readLatency_ms between (50 and 75) AND nfsIO.writeLatency_ms between 50 and 75

  • Critical - if cpuUsage.ioWait_pct more than 5 OR diskIO.readWait_ms more than 25 OR diskIO.writeWait_ms more than 25 OR nfsIO.readLatency_ms more than 75 OR nfsIO.writeLatency_ms more than 75

Process Health

All nodes

  • Normal - if processStat.uptime more than 1 hour AND processStat.cpuUtil_pct less than 50 AND processStat.memoryUtil_pct less than 50

  • Warning - if processStat.uptime less than 1 hour OR processStat.cpuUtil_pct between (50 and 75) OR processStat.memoryUtil_pct between (50 and 75)

  • Critical - if process is DOWN OR processStat.cpuUtil_pct greater than 75 OR processStat.memoryUtil_pct greater than 75

Event Pipeline

Collector only

This indicates whether queues are building up in Collectors.

  • Normal - if eventUploadQueue.total_mb less than 20

  • Warning - if eventUploadQueue.total_mb between 20 and 50

  • Critical - if eventUploadQueue.total_mb greater than 50

Event Pipeline

Worker only

This indicates whether queues are building up in Workers and may be caused by Workers slow in ingesting events to storage.

  • Normal - if eventUploadQueue.disk_mb less than 25

  • Warning - if eventUploadQueue.disk_mb between 25 and 75

  • Critical - if eventUploadQueue.disk_mb greater than 75

Shared Store

Worker, Supervisor

This indicates that some FortiSIEM processes are slow in processing events and may eventually block the writer phParser process from ingesting events. Events may eventually be lost.

  • Normal - if difference between reader and writer's sharedStore_pct is less than 15

  • Warning - if difference between reader and writer's sharedStore_pct is between 15 and 30

  • Critical - if difference between reader and writer's sharedStore_pct is more than 30

Last Status Updated

All nodes

This is based on the health updates between Collector and Supervisor; Worker and Supervisor; and Instance Supervisor and FortiSIEM Manager.

  • Normal - if nodes.metrics.lastStatusUpdated less than 5 minute delay

  • Warning - if nodes.metrics.lastStatusUpdated between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.lastStatusUpdated more than 10 minute delay

Last Event Time

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This detects whether Collectors are falling behind in sending events to Workers. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastEventTime less than 5 minute delay

  • Warning - if nodes.metrics.lastEventTime between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.collectorUploadStatus.lastEventTime more than 10 minute delay

Last File Received

Collector

This information is sent by each Worker to Supervisor based on what each Worker receives from Collectors. This may be caused by Workers slow in ingesting events to storage or Collectors slow processing events and uploading to Workers.

  • Normal - if nodes.metrics.lastFileReceived less than 5 minute delay

  • Warning - if nodes.metrics.lastFileReceived between (5 minute and 10 minute) delay

  • Critical - if nodes.metrics.lastFileReceived more than 10 minute delay