Forecasting

Forecasting Algorithms for Local Mode
Running Forecasting Local Mode

The objective of a Forecasting Task is to learn a time based trend of a field in a dataset and then predict future values of that field. Learning the time based trend is done during the Training phase and future prediction is done during the Inference phase. The dataset for both Training and Inference phases is provided by running FortiSIEM reports.

Forecasting Algorithms for Local Mode

In this mode, the following algorithms can run locally within the FortiSIEM Super/Worker cluster.

ARIMA (AutoRegressive Integrated Moving Average): A statistical analysis model that uses time series data to predict future values. Descriptions of the parameters are here: https://analyticsindiamag.com/quick-way-to-find-p-d-and-q-values-for-arima/ and https://www.sciencedirect.com/topics/mathematics/arima
State Space Dynamic MQ: A forecasting method that can be used to predict future values. Descriptions of the parameters are here: https://www.statsmodels.org/devel/generated/statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ.html#statsmodels.tsa.statespace.dynamic_factor_mq.DynamicFactorMQ

Running Forecasting Local Mode

Step 1: Design

First identify the following items:

Field to Forecast: Each field must be a numerical field.
Fields to use for Forecasting: Each field must be a numerical field. Field to forecast could be the same as field to use for forecasting; meaning that previous values of a field is used to forecast future values of the same field.
Id Field: Identifies who the data is for.
Note: The Id Field is optional.
Date Field: Can be hourly or daily.
A FortiSIEM Report to get this data.

To provide several samples of the data, you can choose one of the following time attributes as a report column

Event Receive Hour
Event Receive Date

Requirements

Report must contain
- Date Field - Event Receive Hour or Event Receive Date
- One numerical field to forecast
- (Optional) Other fields to use for forecasting
Each field must be present in the report result; else the whole row will be ignored by the Machine Learning algorithm.
There can be additional columns in the report and they will be ignored by the machine learning algorithm. However, it is recommended to remove unnecessary columns from the dataset to reduce the size of the dataset exchanged between App Server and phAnomaly modules, during Training and Inference.

Go to Analytics > Search and run various reports. Once you have the right report, save it in Resources > Machine Learning Jobs.

Step 2: Prepare Data

Prepare the data for training.

Go to Analytics > Machine Learning, and click the Import Machine Learning Jobs (open folder) icon.
Select the data source in one of three ways:
1. To prepare data from a Machine Learning Job, choose Import via Jobs and select the Job which has associated Report and algorithm.
2. To prepare data from the Report folder, choose Import via Report and select the report from the Resources > Machine Learning Job folder
3. To prepare data from a CSV file, choose Import Via CSV File and upload the file. In this mode, you can see how the Training algorithm performs, but you cannot schedule for inference, since the data may not be present in FortiSIEM.
For Case 2a and 2b, select the Report Time Range and the Organization for Service Provider Deployments.
Click Run. The results are displayed in Machine Learning > Prepare tab.
Note: For FortiSIEM with ClickHouse deployments, Historical mode searches include a Result Filter panel that provides a top and bottom 100 list (click /to toggle) for the attributes that are part of your Display Field () search. An Add to Main Filter icon () is available to hone search results and identify trends related to the selected attributes with other attributes.

Step 3: Train

Train the Forecasting task using the dataset in Step 2.

Go to Analytics > Machine Learning > Train.
If you chose Import via Jobs, then make sure the Date Field, Field(s) to use for Forecasting, and Id Field are populated correctly.
If you chose Import via Report or Import via CSV File, then
1. Set Run Mode to Local
2. Set Task to Forecasting
3. Choose the Algorithm
4. Choose Id Field
5. If the Algorithm is ARIMA:
  1. Choose the Field to Forecast from the report fields. Current ARIMA implementation uses previous values of a field to forecast the future values of the same field.
  2. Choose the Steps parameter to specify how many time steps to be forecasted. The default value is 5 and can be found by clicking the Settings icon next to the Algorithm.
6. If the Algorithm is State Space Dynamic Factor MQ:
  1. Choose the Field to Forecast and Fields to use for Forecasting from the report fields.
  2. Choose the Steps parameter to specify how many time steps to be forecasted. The default value is 5 and can be found by clicking the Settings icon next to the Algorithm.
Choose the Train factor which should be greater than 70%. This means that 70% of the data will be used for Training and 30% used for Testing.
Click Train.

After you have completed the Training, the results are shown in the Train > Output tab.

Model Quality:

This shows how accurately the algorithm is able to predict the field. The following metrics are calculated:

Max Error: Mmaximum of the error between predicted value and actual value over all data points. Lower value means that regression is a better fit.
Mean Absolute Error: Average of Absolute difference between predicted and actual values over all data points. Lower value means that regression is a better fit.
Mean Squared Error: Average of Square of difference between predicted and actual values over all data points. Lower value means that regression is a better fit.
R2 score: a statistical measure of how well the regression predictions approximate the real data points. An R2 of 1 indicates that the regression predictions perfectly fit the data. R-square value of 0.8 means that 80% of the variation in the predicted attribute is explained by the feature attributes.
Root Mean Squared Error (RMSE): Square root of the average of Square of difference between predicted and actual values over all data points. Lower value means that regression is a better fit. RMSE is affected by the scale of the data. RMSE can be heavily affected by a few predictions which are much worse than the rest.

Forecasting Result is shown in two ways

Tabular form: The predicted values are shown in the first few rows.
Trend form: Shows how the algorithm learns the values over time and is able to forecast future values

If you want to change the algorithm parameters and re-train, then click Tune & Train, change the parameters and click Save & Train.

Step 4: Schedule

Once the training is complete, you can schedule the job for Inference.

Input Details section shows the Report and the Org chosen for the report. These were already chosen during Prepare phase and will be used during Inference.
Algorithm Setup shows the Machine Learning Algorithm and its parameters. These were already chosen during Train phase and will be used during Inference.
Schedule Setup shows the Job details and schedules
- Job Id: Specifies the unique Job Id. If it is a system job, it will be overwritten with a new job id when it is saved as a User job. If it is a User job, then user has option to Save as a new user job with different job id or keeping the same job Id.
- Job Name: Name of the job. You can overwrite this one. When a job with the same name exists then a data stamp will be appended.
- Job Description: Description of the job.
- Inference schedule: The frequency at which Inference job will be run
- Retraining schedule: The frequency at which the model would be retrained. Retraining is expensive and it should be carefully considered. Recommended retraining is at least 7 days.
- (Retraining) Report Window: The Report time window during retraining process. Long time window may cause the report to run slowly and this should be carefully considered as well. It is recommended to choose the same time window chosen during the Prepare process.
- Job Group: Shows the folder under Resources > Machine Learning Jobs where this job will be saved.
Action on Inference: Specifies the action to be taken when an anomaly is found during the Inference process.
- One choice is available – Send email(s) by entering the email address(es) in the Send email to field. Make sure that the email server is specified in Admin > Settings > Email.
- Check Enabled to ensure that Inference is enabled.

Finally click Save to save this to database. If it is a system job, then a new User job will be created. If it is a User job, then user has option to Save as a new user job with different job id or overwriting the current job.