Fortinet white logo
Fortinet white logo

User Guide

Classification

Classification

The objective of a Classification Task to learn how to assign labels to items based on various fields in the dataset and then assign a label to new item based on current values. This requires labels to be present in the dataset. Labels can be binary e.g. malware/not malware, spam/not-spam or can belong to more than 2 classes as well. Learning the label assignment is done during the Training phase and assigning labels to new data is done during the Inference phase. The dataset for both Training and Inference phases is provided by running FortiSIEM reports.

Classification Algorithms for Local Mode

In this mode, the following algorithms can run locally within the FortiSIEM Supervisor/Worker cluster.

  • Decision Tree Classifier: A supervised classification algorithm that uses a Decision Tree constructed using the feature variables to classify new data points. It requires a set of pre-labelled data points during the training process to construct the tree.
  • Logistic Regression: A supervised binary classification algorithm that can classify data points into two classes based on sigmoid function using a set of features. It requires a set of pre-labelled data points during the training process.
  • Random Forest Classifier: A supervised classification algorithm that uses Ensemble learning and Bootstrapping techniques to improve the accuracy of Decision Tree based classification algorithms. Ensemble learning uses multiple models and Bootstrapping randomly samples datasets and then averages the results of each model to improve accuracy. It requires a set of pre-labelled data points during the training process to construct the trees.
  • SGDClassifier: A supervised classification algorithm that uses Stochastic Gradient Descent (SGD) update techniques for existing classifiers and is computationally efficient for large datasets. It requires a set of pre-labelled data points during the training process.
  • Support Vector Classifier: Support Vector Classifier (SVC), also called Linear Support Vector Machine (SVM), is a supervised classification algorithm that separates data points using a hyperplane with specified margins (support vector). It requires a set of pre-labelled data points during the training process.

Running Classification Local Mode

Step 1: Design

First identify the following items:

  • Fields to use for Classification: Each field must be a numerical field.
  • Class Label: The class to which the data corresponds to.
  • A FortiSIEM Report to get this data.

To provide several samples of the data, you can choose one of the following time attributes as a report column

  • Event Receive Hour
  • Event Receive Date

Requirements

  1. Report must contain
    • A Class Label
    • One or more numerical fields to use for classification
    To provide several samples, you can provide a time field. This is optional.
  2. Each field must be present in the report result; else the whole row will be ignored by the Machine Learning algorithm.
  3. There can be additional columns in the report and they will be ignored by the machine learning algorithm. However, it is recommended to remove unnecessary columns from the dataset to reduce the size of the dataset exchanged between App Server and phAnomaly modules, during Training and Inference.

Go to Analytics > Search and run various reports. Once you have the right report, save it in Resources > Machine Learning Jobs.

Step 2: Prepare Data

Prepare the data for training.

  1. Go to Analytics > Machine Learning, and click the Import Machine Learning Jobs (open folder) icon.
  2. Select the data source in one of three ways:
    1. To prepare data from a Machine Learning Job, choose Import via Jobs and select the Job which has associated Report and algorithm.
    2. To prepare data from the Report folder, choose Import via Report and select the report from the Resources > Machine Learning Job folder
    3. To prepare data from a CSV file, choose Import Via CSV File and upload the file. In this mode, you can see how the Training algorithm performs, but you cannot schedule for inference, since the data may not be present in FortiSIEM.
  3. For Case 2a and 2b, select the Report Time Range and the Organization for Service Provider Deployments.
  4. Click Run. The results are displayed in Machine Learning > Prepare tab.

Step 3: Train

Train the Classification task using the dataset in Step 2.

  1. Go to Analytics > Machine Learning > Train.
  2. If you chose Import via Jobs, then make sure the Class Label and Fields to use for Classification are populated correctly.
  3. If you chose Import via Report or Import via CSV File, then
    1. Set Run Mode to Local
    2. Set Task to Classification
    3. Choose the Algorithm
    4. Choose the Class Label and Fields to use for Classification from the report fields.
  4. Choose the Train factor which should be greater than 70%. This means that 70% of the data will be used for Training and 30% used for Testing.
  5. Click Train.

After you have completed the Training, the results are shown in the Train > Output tab.

Model Quality:

The following metrics show the quality of the clusters found.

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • Accuracy: Accuracy is the total number of correct classification divided by the total number of attempted classification, i.e. ((TP+TN)/(TP+TN+FP+FN)). It is between 0 and 1. A score close to 1 means accurate classification.
  • Recall: Recall is calculated by dividing the True Positives by anything that should have been predicted as Positive. So Recall is FP/(FP+TN).
  • Precision: Precision is calculated by dividing the True Positives by anything that was classified as a Positive. So Precision is TP/(TP+FP).
  • F1 Score: F1 Score combines Precision and Recall into a single metric by taking their harmonic mean. F1 Score = 2 / ((1/Precision)+(1/Recall)). It ranges from 0-1, and a higher F1 score denotes a better quality classifier.
  • ROC AUC: It measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1). It tells how much the model is capable of distinguishing between classes. Higher ROC AUC values indicate that the model is better at predicting 0 classes as 0 and 1 classes as 1.
  • Confusion Matrix: presents TP, TN, FP and FN in a matrix

If you want to change the algorithm parameters and re-train, then click Tune & Train, change the parameters and click Save & Train.

Step 4: Schedule

Once the training is complete, you can schedule the job for Inference.

  • Input Details section shows the Report and the Org chosen for the report. These were already chosen during Prepare phase and will be used during Inference.
  • Algorithm Setup shows the Machine Learning Algorithm and its parameters. These were already chosen during Train phase and will be used during Inference.
  • Schedule Setup shows the Job details and schedules
    • Job Id: Specifies the unique Job Id. If it is a system job, it will be overwritten with a new job id when it is saved as a User job. If it is a User job, then user has option to Save as a new user job with different job id or keeping the same job Id.
    • Job Name: Name of the job. You can overwrite this one. When a job with the same name exists then a data stamp will be appended.
    • Job Description: Description of the job.
    • Inference schedule: The frequency at which Inference job will be run
    • Retraining schedule: The frequency at which the model would be retrained. Retraining is expensive and it should be carefully considered. Recommended retraining is at least 7 days.
    • (Retraining) Report Window: The Report time window during retraining process. Long time window may cause the report to run slowly and this should be carefully considered as well. It is recommended to choose the same time window chosen during the Prepare process.
    • Job Group: Shows the folder under Resources > Machine Learning Jobs where this job will be saved.
  • Action on Inference: Specifies the action to be taken when an anomaly is found during the Inference process.
    • Two choices are available – creating a FortiSIEM Incident or sending an email. Specify the emails if you want emails to be sent. Make sure that email server is specified in Admin > Settings > Email.
    • Check Enabled to ensure that Inference is enabled.

Finally click Save to save this to database. If it is a system job, then a new User job will be created. If it is a User job, then user has option to Save as a new user job with different job id or overwriting the current job.

Classification Algorithms for Local Auto Mode

In this mode, FortiSIEM picks the best algorithm from the following:

  • Decision Tree Classifier
  • Random Forest Classifier
  • SGDClassifier
  • Support Vector Classifier (SVC)

Note: The Max Run Time parameter is used to limit the amount of time this job runs. By default it is set to 5 minutes. The longer this job runs, potentially better result can be generated.

Running Classification Local Auto Mode

To run, follow the Classification steps in Algorithms for Local Mode, but in Step 3, 3a, select Run Mode as Local Auto.

Classification Algorithms for AWS Mode

In this mode, the following algorithms runs in AWS. The following algorithms are supported.

Running Classification AWS Mode

Step 0: Set Up AWS

Set up AWS SageMaker by following the instructions in Set Up AWS SageMaker.

Configure AWS in FortiSIEM by following the instructions in Configure FortiSIEM to use AWS SageMaker.

Step 1: Design

First identify the following items:

  • Fields to use for Classification: Each field must be a numerical field.
  • Class Label: The class to which the data corresponds to.
  • A FortiSIEM Report to get this data.

To provide several samples of the data, you can choose one of the following time attributes as a report column

  • Event Receive Hour
  • Event Receive Date

Requirements

  1. Report must contain
    • A Class Label
    • One or more numerical fields to use for classification
    To provide several samples, you can provide a time field. This is optional.
  2. Each field must be present in the report result; else the whole row will be ignored by the Machine Learning algorithm.
  3. There can be additional columns in the report and they will be ignored by the machine learning algorithm. However, it is recommended to remove unnecessary columns from the dataset to reduce the size of the dataset exchanged between App Server and phAnomaly modules, during Training and Inference.

Go to Analytics > Search and run various reports. Once you have the right report, save it in Resources > Machine Learning Jobs.

Step 2: Prepare Data

Prepare the data for training.

  1. Go to Analytics > Machine Learning, and click the Import Machine Learning Jobs (open folder) icon.
  2. Select the data source in one of three ways:
    1. To prepare data from a Machine Learning Job, choose Import via Jobs and select the Job which has associated Report and algorithm.
    2. To prepare data from the Report folder, choose Import via Report and select the report from the Resources > Machine Learning Job folder
    3. To prepare data from a CSV file, choose Import Via CSV File and upload the file. In this mode, you can see how the Training algorithm performs, but you cannot schedule for inference, since the data may not be present in FortiSIEM.
  3. For Case 2a and 2b, select the Report Time Range and the Organization for Service Provider Deployments.
  4. Click Run. The results are displayed in Machine Learning > Prepare tab.

Step 3: Train

Train the Classification task using the dataset in Step 2.

  1. Go to Analytics > Machine Learning > Train.
  2. If you chose Import via Jobs, then make sure the Class Label and Fields to use for Classification are populated correctly.
  3. If you chose Import via Report or Import via CSV File, then
    1. Set Run Mode to AWS
    2. Set Task to Classification
    3. Choose the Algorithm
    4. Choose the Class Label and Fields to use for Classification from the report fields.
  4. Choose the Train factor which should be greater than 70%. This means that 70% of the data will be used for Training and 30% used for Testing.
  5. Click Train.

After you have completed the Training, the results are shown in the Train > Output tab.

Model Quality:

The following metrics show the quality of the clusters found.

Binary Classification

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • Precision: The precision of the final model on the validation dataset. If you choose this metric as the objective, we recommend setting a target recall by setting the binary_classifier_model_selection hyperparameter to precision_at_target_recall and setting the value for the target_recall hyperparameter. This objective metric is only valid for binary classification.
  • Recall: The recall of the final model on the validation dataset. If you choose this metric as the objective, we recommend setting a target precision by setting the binary_classifier_model_selection hyperparameter to recall_at_target_precision and setting the value for the target_precision hyperparameter. This objective metric is only valid for binary classification.
  • roc_auc_score: The area under receiving operating characteristic curve (ROC curve) of the final model on the validation dataset. This objective metric is only valid for binary classification.
  • binary_classification_accuracy: The accuracy of the final model on the validation dataset. This objective metric is only valid for binary classification.
  • binary_f_beta: The F-beta score of the final model on the validation dataset. By default, the F-beta score is the F1 score, which is the harmonic mean of the validation:precision and validation:recall metrics. This objective metric is only valid for binary classification.
  • Confusion Matrix: Presents TP, TN, FP and FN in a matrix

Multiclass Classification

  • Dcg: The discounted cumulative gain of the final model on the validation dataset. This objective metric is only valid for multiclass classification.
  • multiclass_accuracy: The accuracy of the final model on the validation dataset. This objective metric is only valid for multiclass classification.
  • multiclass_top_k_accuracy: The accuracy among the top k labels predicted on the validation dataset. If you choose this metric as the objective, we recommend setting the value of k using the accuracy_top_k hyperparameter. This objective metric is only valid for multiclass classification.

If you want to change the algorithm parameters and re-train, then click Tune & Train, change the parameters and click Save & Train.

Step 4: Schedule

Once the training is complete, you can schedule the job for Inference.

  • Input Details section shows the Report and the Org chosen for the report. These were already chosen during Prepare phase and will be used during Inference.
  • Algorithm Setup shows the Machine Learning Algorithm and its parameters. These were already chosen during Train phase and will be used during Inference.
  • Schedule Setup shows the Job details and schedules
    • Job Id: Specifies the unique Job Id. If it is a system job, it will be overwritten with a new job id when it is saved as a User job. If it is a User job, then user has option to Save as a new user job with different job id or keeping the same job Id.
    • Job Name: Name of the job. You can overwrite this one. When a job with the same name exists then a data stamp will be appended.
    • Job Description: Description of the job.
    • Inference schedule: The frequency at which Inference job will be run
    • Retraining schedule: The frequency at which the model would be retrained. Retraining is expensive and it should be carefully considered. Recommended retraining is at least 7 days.
    • (Retraining) Report Window: The Report time window during retraining process. Long time window may cause the report to run slowly and this should be carefully considered as well. It is recommended to choose the same time window chosen during the Prepare process.
    • Job Group: Shows the folder under Resources > Machine Learning Jobs where this job will be saved.
  • Action on Inference: Specifies the action to be taken when an anomaly is found during the Inference process.
    • Two choices are available – creating a FortiSIEM Incident or sending an email. Specify the emails if you want emails to be sent. Make sure that email server is specified in Admin > Settings > Email.
    • Check Enabled to ensure that Inference is enabled.

Finally click Save to save this to database. If it is a system job, then a new User job will be created. If it is a User job, then user has option to Save as a new user job with different job id or overwriting the current job.

Classification Algorithms for AWS Auto Mode

In this mode, FortiSIEM automatically chooses the best algorithm with the optimal parameters. Depending on the size of your dataset (whether is it greater or smaller than 100MB), algorithms from HPO mode or from ensembling mode will be considered. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-model-support-validation.html. Definitions below taken from Amazon SageMaker Developer Guide.

HPO mode (Dataset > 100MB)

  • Linear learner: A supervised learning algorithm that can solve either classification or regression problems.
  • XGBoost: A supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.
  • Deep learning algorithm: A multilayer perceptron (MLP) and feedforward artificial neural network. This algorithm can handle data that is not linearly separable.

Ensembling mode (Dataset <=100MB)

  • LightGBM: An optimized framework that uses tree-based algorithms with gradient boosting. This algorithm uses trees that grow in breadth, rather than depth, and is highly optimized for speed.
  • CatBoost: A framework that uses tree-based algorithms with gradient boosting. Optimized for handling categorical variables.
  • XGBoost: A framework that uses tree-based algorithms with gradient boosting that grows in depth, rather than breadth.
  • Random Forest: A tree-based algorithm that uses several decision trees on random sub-samples of the data with replacement. The trees are split into optimal nodes at each level. The decisions of each tree are averaged together to prevent overfitting and improve predictions.
  • Extra Trees: A tree-based algorithm that uses several decision trees on the entire dataset. The trees are split randomly at each level. The decisions of each tree are averaged to prevent overfitting and to improve predictions. Extra trees add a degree of randomization in comparison to the random forest algorithm.
  • Linear Models: A framework that uses a linear equation to model the relationship between two variables in observed data.
  • Neural network torch: A neural network model that is implemented using Pytorch.
  • Neural network fast.ai: A neural network model that is implemented using fast.ai.

Note: The Max Run Time parameter is used to limit the amount of time this job runs. By default it is set to 225 minutes.

The longer this job runs, potentially better results can be generated.

Running Classification AWS Auto Mode

To run, follow the Regression steps in Algorithm for AWS Mode, but in Step 3, 3a, select Run Mode as AWS Auto.

For Step 3, the following model quality information pertains.

Model Quality:

The following metrics show the quality of the clusters found.

Binary Classification

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • F1: The F1 score is the harmonic mean of the precision and recall, defined as follows: F1 = 2 * (precision * recall) / (precision + recall). It is used for binary classification into classes traditionally referred to as positive and negative. Predictions are said to be true when they match their actual (correct) class, and false when they do not.Precision is the ratio of the true positive predictions to all positive predictions, and it includes the false positives in a dataset. Precision measures the quality of the prediction when it predicts the positive class.Recall (or sensitivity) is the ratio of the true positive predictions to all actual positive instances. Recall measures how completely a model predicts the actual class members in a dataset.F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
  • LogLoss: Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. It is used in both binary and multiclass classification and in neural nets. It is also the cost function for logistic regression. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
  • Recall: Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: Recall = TP/(TP+FN), with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. It is used in binary classification.

    Recall is important when testing for cancer because it's used to find all of the true positives. A false positive (FP) reflects a positive prediction that is actually negative in the data. It is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.
  • Precision: Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: Precision = TP/(TP+FP), with values ranging from zero (0) to one (1), and is used in binary classification. Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
  • AUC: The area under the curve (AUC) metric is used to compare and evaluate binary classification by algorithms that return probabilities, such as logistic regression. To map the probabilities into classifications, these are compared against a threshold value.
    The relevant curve is the receiver operating characteristic curve (ROC curve). The ROC curve plots the true positive rate (TPR) of predictions (or recall) against the false positive rate (FPR) as a function of the threshold value, above which a prediction is considered positive. Increasing the threshold results in fewer false positives, but more false negatives.
    AUC is the area under this ROC curve. Therefore, AUC provides an aggregated measure of the model performance across all possible classification thresholds. AUC scores vary between 0 and 1. A score of 1 indicates perfect accuracy, and a score of one half (0.5) indicates that the prediction is not better than a random classifier.
  • Accuracy: The ratio of the number of correctly classified items to the total number of (correctly and incorrectly) classified items. It is used for both binary and multiclass classification. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates perfect inaccuracy.
  • BalancedAccuracy: BalancedAccuracy is a metric that measures the ratio of accurate predictions to all predictions. This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is used in both binary and multiclass classification and is defined as follows: 0.5*((TP/P)+(TN/N)), with values ranging from 0 to 1. BalancedAccuracy gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.

Multiclass Classification:

  • F1macro: The F1macro score applies F1 scoring to multiclass classification problems. It does this by calculating the precision and recall, and then taking their harmonic mean to calculate the F1 score for each class. Lastly, the F1macro averages the individual scores to obtain the F1macro score. F1macroscores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
  • PrecisionMacro: The precision macro computes precision for multiclass classification problems. It does this by calculating precision for each class and averaging scores to obtain precision for several classes. PrecisionMacro scores range from zero (0) to one (1). Higher scores reflect the model's ability to predict true positives (TP) out of all of the positives that it identifies, averaged across multiple classes.
  • Accuracy: The ratio of the number of correctly classified items to the total number of (correctly and incorrectly) classified items. It is used for both binary and multiclass classification. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates perfect inaccuracy.
  • BalancedAccuracy: BalancedAccuracy is a metric that measures the ratio of accurate predictions to all predictions. This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is used in both binary and multiclass classification and is defined as follows: 0.5*((TP/P)+(TN/N)), with values ranging from 0 to 1. BalancedAccuracy gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
  • LogLoss: Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. It is used in both binary and multiclass classification and in neural nets. It is also the cost function for logistic regression. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
  • RecallMacro: The RecallMacro computes recall for multiclass classification problems by calculating recall for each class and averaging scores to obtain recall for several classes. RecallMacro scores range from 0 to 1. Higher scores reflect the model's ability to predict true positives (TP) in a dataset, whereas a true positive reflects a positive prediction that is also an actual positive value in the data. It is often insufficient to measure only recall, because predicting every output as a true positive will yield a perfect recall score.

Classification

Classification

The objective of a Classification Task to learn how to assign labels to items based on various fields in the dataset and then assign a label to new item based on current values. This requires labels to be present in the dataset. Labels can be binary e.g. malware/not malware, spam/not-spam or can belong to more than 2 classes as well. Learning the label assignment is done during the Training phase and assigning labels to new data is done during the Inference phase. The dataset for both Training and Inference phases is provided by running FortiSIEM reports.

Classification Algorithms for Local Mode

In this mode, the following algorithms can run locally within the FortiSIEM Supervisor/Worker cluster.

  • Decision Tree Classifier: A supervised classification algorithm that uses a Decision Tree constructed using the feature variables to classify new data points. It requires a set of pre-labelled data points during the training process to construct the tree.
  • Logistic Regression: A supervised binary classification algorithm that can classify data points into two classes based on sigmoid function using a set of features. It requires a set of pre-labelled data points during the training process.
  • Random Forest Classifier: A supervised classification algorithm that uses Ensemble learning and Bootstrapping techniques to improve the accuracy of Decision Tree based classification algorithms. Ensemble learning uses multiple models and Bootstrapping randomly samples datasets and then averages the results of each model to improve accuracy. It requires a set of pre-labelled data points during the training process to construct the trees.
  • SGDClassifier: A supervised classification algorithm that uses Stochastic Gradient Descent (SGD) update techniques for existing classifiers and is computationally efficient for large datasets. It requires a set of pre-labelled data points during the training process.
  • Support Vector Classifier: Support Vector Classifier (SVC), also called Linear Support Vector Machine (SVM), is a supervised classification algorithm that separates data points using a hyperplane with specified margins (support vector). It requires a set of pre-labelled data points during the training process.

Running Classification Local Mode

Step 1: Design

First identify the following items:

  • Fields to use for Classification: Each field must be a numerical field.
  • Class Label: The class to which the data corresponds to.
  • A FortiSIEM Report to get this data.

To provide several samples of the data, you can choose one of the following time attributes as a report column

  • Event Receive Hour
  • Event Receive Date

Requirements

  1. Report must contain
    • A Class Label
    • One or more numerical fields to use for classification
    To provide several samples, you can provide a time field. This is optional.
  2. Each field must be present in the report result; else the whole row will be ignored by the Machine Learning algorithm.
  3. There can be additional columns in the report and they will be ignored by the machine learning algorithm. However, it is recommended to remove unnecessary columns from the dataset to reduce the size of the dataset exchanged between App Server and phAnomaly modules, during Training and Inference.

Go to Analytics > Search and run various reports. Once you have the right report, save it in Resources > Machine Learning Jobs.

Step 2: Prepare Data

Prepare the data for training.

  1. Go to Analytics > Machine Learning, and click the Import Machine Learning Jobs (open folder) icon.
  2. Select the data source in one of three ways:
    1. To prepare data from a Machine Learning Job, choose Import via Jobs and select the Job which has associated Report and algorithm.
    2. To prepare data from the Report folder, choose Import via Report and select the report from the Resources > Machine Learning Job folder
    3. To prepare data from a CSV file, choose Import Via CSV File and upload the file. In this mode, you can see how the Training algorithm performs, but you cannot schedule for inference, since the data may not be present in FortiSIEM.
  3. For Case 2a and 2b, select the Report Time Range and the Organization for Service Provider Deployments.
  4. Click Run. The results are displayed in Machine Learning > Prepare tab.

Step 3: Train

Train the Classification task using the dataset in Step 2.

  1. Go to Analytics > Machine Learning > Train.
  2. If you chose Import via Jobs, then make sure the Class Label and Fields to use for Classification are populated correctly.
  3. If you chose Import via Report or Import via CSV File, then
    1. Set Run Mode to Local
    2. Set Task to Classification
    3. Choose the Algorithm
    4. Choose the Class Label and Fields to use for Classification from the report fields.
  4. Choose the Train factor which should be greater than 70%. This means that 70% of the data will be used for Training and 30% used for Testing.
  5. Click Train.

After you have completed the Training, the results are shown in the Train > Output tab.

Model Quality:

The following metrics show the quality of the clusters found.

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • Accuracy: Accuracy is the total number of correct classification divided by the total number of attempted classification, i.e. ((TP+TN)/(TP+TN+FP+FN)). It is between 0 and 1. A score close to 1 means accurate classification.
  • Recall: Recall is calculated by dividing the True Positives by anything that should have been predicted as Positive. So Recall is FP/(FP+TN).
  • Precision: Precision is calculated by dividing the True Positives by anything that was classified as a Positive. So Precision is TP/(TP+FP).
  • F1 Score: F1 Score combines Precision and Recall into a single metric by taking their harmonic mean. F1 Score = 2 / ((1/Precision)+(1/Recall)). It ranges from 0-1, and a higher F1 score denotes a better quality classifier.
  • ROC AUC: It measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1). It tells how much the model is capable of distinguishing between classes. Higher ROC AUC values indicate that the model is better at predicting 0 classes as 0 and 1 classes as 1.
  • Confusion Matrix: presents TP, TN, FP and FN in a matrix

If you want to change the algorithm parameters and re-train, then click Tune & Train, change the parameters and click Save & Train.

Step 4: Schedule

Once the training is complete, you can schedule the job for Inference.

  • Input Details section shows the Report and the Org chosen for the report. These were already chosen during Prepare phase and will be used during Inference.
  • Algorithm Setup shows the Machine Learning Algorithm and its parameters. These were already chosen during Train phase and will be used during Inference.
  • Schedule Setup shows the Job details and schedules
    • Job Id: Specifies the unique Job Id. If it is a system job, it will be overwritten with a new job id when it is saved as a User job. If it is a User job, then user has option to Save as a new user job with different job id or keeping the same job Id.
    • Job Name: Name of the job. You can overwrite this one. When a job with the same name exists then a data stamp will be appended.
    • Job Description: Description of the job.
    • Inference schedule: The frequency at which Inference job will be run
    • Retraining schedule: The frequency at which the model would be retrained. Retraining is expensive and it should be carefully considered. Recommended retraining is at least 7 days.
    • (Retraining) Report Window: The Report time window during retraining process. Long time window may cause the report to run slowly and this should be carefully considered as well. It is recommended to choose the same time window chosen during the Prepare process.
    • Job Group: Shows the folder under Resources > Machine Learning Jobs where this job will be saved.
  • Action on Inference: Specifies the action to be taken when an anomaly is found during the Inference process.
    • Two choices are available – creating a FortiSIEM Incident or sending an email. Specify the emails if you want emails to be sent. Make sure that email server is specified in Admin > Settings > Email.
    • Check Enabled to ensure that Inference is enabled.

Finally click Save to save this to database. If it is a system job, then a new User job will be created. If it is a User job, then user has option to Save as a new user job with different job id or overwriting the current job.

Classification Algorithms for Local Auto Mode

In this mode, FortiSIEM picks the best algorithm from the following:

  • Decision Tree Classifier
  • Random Forest Classifier
  • SGDClassifier
  • Support Vector Classifier (SVC)

Note: The Max Run Time parameter is used to limit the amount of time this job runs. By default it is set to 5 minutes. The longer this job runs, potentially better result can be generated.

Running Classification Local Auto Mode

To run, follow the Classification steps in Algorithms for Local Mode, but in Step 3, 3a, select Run Mode as Local Auto.

Classification Algorithms for AWS Mode

In this mode, the following algorithms runs in AWS. The following algorithms are supported.

Running Classification AWS Mode

Step 0: Set Up AWS

Set up AWS SageMaker by following the instructions in Set Up AWS SageMaker.

Configure AWS in FortiSIEM by following the instructions in Configure FortiSIEM to use AWS SageMaker.

Step 1: Design

First identify the following items:

  • Fields to use for Classification: Each field must be a numerical field.
  • Class Label: The class to which the data corresponds to.
  • A FortiSIEM Report to get this data.

To provide several samples of the data, you can choose one of the following time attributes as a report column

  • Event Receive Hour
  • Event Receive Date

Requirements

  1. Report must contain
    • A Class Label
    • One or more numerical fields to use for classification
    To provide several samples, you can provide a time field. This is optional.
  2. Each field must be present in the report result; else the whole row will be ignored by the Machine Learning algorithm.
  3. There can be additional columns in the report and they will be ignored by the machine learning algorithm. However, it is recommended to remove unnecessary columns from the dataset to reduce the size of the dataset exchanged between App Server and phAnomaly modules, during Training and Inference.

Go to Analytics > Search and run various reports. Once you have the right report, save it in Resources > Machine Learning Jobs.

Step 2: Prepare Data

Prepare the data for training.

  1. Go to Analytics > Machine Learning, and click the Import Machine Learning Jobs (open folder) icon.
  2. Select the data source in one of three ways:
    1. To prepare data from a Machine Learning Job, choose Import via Jobs and select the Job which has associated Report and algorithm.
    2. To prepare data from the Report folder, choose Import via Report and select the report from the Resources > Machine Learning Job folder
    3. To prepare data from a CSV file, choose Import Via CSV File and upload the file. In this mode, you can see how the Training algorithm performs, but you cannot schedule for inference, since the data may not be present in FortiSIEM.
  3. For Case 2a and 2b, select the Report Time Range and the Organization for Service Provider Deployments.
  4. Click Run. The results are displayed in Machine Learning > Prepare tab.

Step 3: Train

Train the Classification task using the dataset in Step 2.

  1. Go to Analytics > Machine Learning > Train.
  2. If you chose Import via Jobs, then make sure the Class Label and Fields to use for Classification are populated correctly.
  3. If you chose Import via Report or Import via CSV File, then
    1. Set Run Mode to AWS
    2. Set Task to Classification
    3. Choose the Algorithm
    4. Choose the Class Label and Fields to use for Classification from the report fields.
  4. Choose the Train factor which should be greater than 70%. This means that 70% of the data will be used for Training and 30% used for Testing.
  5. Click Train.

After you have completed the Training, the results are shown in the Train > Output tab.

Model Quality:

The following metrics show the quality of the clusters found.

Binary Classification

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • Precision: The precision of the final model on the validation dataset. If you choose this metric as the objective, we recommend setting a target recall by setting the binary_classifier_model_selection hyperparameter to precision_at_target_recall and setting the value for the target_recall hyperparameter. This objective metric is only valid for binary classification.
  • Recall: The recall of the final model on the validation dataset. If you choose this metric as the objective, we recommend setting a target precision by setting the binary_classifier_model_selection hyperparameter to recall_at_target_precision and setting the value for the target_precision hyperparameter. This objective metric is only valid for binary classification.
  • roc_auc_score: The area under receiving operating characteristic curve (ROC curve) of the final model on the validation dataset. This objective metric is only valid for binary classification.
  • binary_classification_accuracy: The accuracy of the final model on the validation dataset. This objective metric is only valid for binary classification.
  • binary_f_beta: The F-beta score of the final model on the validation dataset. By default, the F-beta score is the F1 score, which is the harmonic mean of the validation:precision and validation:recall metrics. This objective metric is only valid for binary classification.
  • Confusion Matrix: Presents TP, TN, FP and FN in a matrix

Multiclass Classification

  • Dcg: The discounted cumulative gain of the final model on the validation dataset. This objective metric is only valid for multiclass classification.
  • multiclass_accuracy: The accuracy of the final model on the validation dataset. This objective metric is only valid for multiclass classification.
  • multiclass_top_k_accuracy: The accuracy among the top k labels predicted on the validation dataset. If you choose this metric as the objective, we recommend setting the value of k using the accuracy_top_k hyperparameter. This objective metric is only valid for multiclass classification.

If you want to change the algorithm parameters and re-train, then click Tune & Train, change the parameters and click Save & Train.

Step 4: Schedule

Once the training is complete, you can schedule the job for Inference.

  • Input Details section shows the Report and the Org chosen for the report. These were already chosen during Prepare phase and will be used during Inference.
  • Algorithm Setup shows the Machine Learning Algorithm and its parameters. These were already chosen during Train phase and will be used during Inference.
  • Schedule Setup shows the Job details and schedules
    • Job Id: Specifies the unique Job Id. If it is a system job, it will be overwritten with a new job id when it is saved as a User job. If it is a User job, then user has option to Save as a new user job with different job id or keeping the same job Id.
    • Job Name: Name of the job. You can overwrite this one. When a job with the same name exists then a data stamp will be appended.
    • Job Description: Description of the job.
    • Inference schedule: The frequency at which Inference job will be run
    • Retraining schedule: The frequency at which the model would be retrained. Retraining is expensive and it should be carefully considered. Recommended retraining is at least 7 days.
    • (Retraining) Report Window: The Report time window during retraining process. Long time window may cause the report to run slowly and this should be carefully considered as well. It is recommended to choose the same time window chosen during the Prepare process.
    • Job Group: Shows the folder under Resources > Machine Learning Jobs where this job will be saved.
  • Action on Inference: Specifies the action to be taken when an anomaly is found during the Inference process.
    • Two choices are available – creating a FortiSIEM Incident or sending an email. Specify the emails if you want emails to be sent. Make sure that email server is specified in Admin > Settings > Email.
    • Check Enabled to ensure that Inference is enabled.

Finally click Save to save this to database. If it is a system job, then a new User job will be created. If it is a User job, then user has option to Save as a new user job with different job id or overwriting the current job.

Classification Algorithms for AWS Auto Mode

In this mode, FortiSIEM automatically chooses the best algorithm with the optimal parameters. Depending on the size of your dataset (whether is it greater or smaller than 100MB), algorithms from HPO mode or from ensembling mode will be considered. For more information, see https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-model-support-validation.html. Definitions below taken from Amazon SageMaker Developer Guide.

HPO mode (Dataset > 100MB)

  • Linear learner: A supervised learning algorithm that can solve either classification or regression problems.
  • XGBoost: A supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of estimates from a set of simpler and weaker models.
  • Deep learning algorithm: A multilayer perceptron (MLP) and feedforward artificial neural network. This algorithm can handle data that is not linearly separable.

Ensembling mode (Dataset <=100MB)

  • LightGBM: An optimized framework that uses tree-based algorithms with gradient boosting. This algorithm uses trees that grow in breadth, rather than depth, and is highly optimized for speed.
  • CatBoost: A framework that uses tree-based algorithms with gradient boosting. Optimized for handling categorical variables.
  • XGBoost: A framework that uses tree-based algorithms with gradient boosting that grows in depth, rather than breadth.
  • Random Forest: A tree-based algorithm that uses several decision trees on random sub-samples of the data with replacement. The trees are split into optimal nodes at each level. The decisions of each tree are averaged together to prevent overfitting and improve predictions.
  • Extra Trees: A tree-based algorithm that uses several decision trees on the entire dataset. The trees are split randomly at each level. The decisions of each tree are averaged to prevent overfitting and to improve predictions. Extra trees add a degree of randomization in comparison to the random forest algorithm.
  • Linear Models: A framework that uses a linear equation to model the relationship between two variables in observed data.
  • Neural network torch: A neural network model that is implemented using Pytorch.
  • Neural network fast.ai: A neural network model that is implemented using fast.ai.

Note: The Max Run Time parameter is used to limit the amount of time this job runs. By default it is set to 225 minutes.

The longer this job runs, potentially better results can be generated.

Running Classification AWS Auto Mode

To run, follow the Regression steps in Algorithm for AWS Mode, but in Step 3, 3a, select Run Mode as AWS Auto.

For Step 3, the following model quality information pertains.

Model Quality:

The following metrics show the quality of the clusters found.

Binary Classification

  • True Positives (TP): Correct classification for label = 0, i.e. actual = 0 and predicted = 0
  • True Negative (TN): Correct classification for label = 1, i.e. actual = 1 and predicted = 1
  • False Positive (FP): Incorrect classification for label = 0, i.e. actual = 0 and predicted = 1
  • False Negative (FN): Incorrect classification for label = 1, i.e. actual = 1 and predicted = 0
  • F1: The F1 score is the harmonic mean of the precision and recall, defined as follows: F1 = 2 * (precision * recall) / (precision + recall). It is used for binary classification into classes traditionally referred to as positive and negative. Predictions are said to be true when they match their actual (correct) class, and false when they do not.Precision is the ratio of the true positive predictions to all positive predictions, and it includes the false positives in a dataset. Precision measures the quality of the prediction when it predicts the positive class.Recall (or sensitivity) is the ratio of the true positive predictions to all actual positive instances. Recall measures how completely a model predicts the actual class members in a dataset.F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
  • LogLoss: Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. It is used in both binary and multiclass classification and in neural nets. It is also the cost function for logistic regression. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
  • Recall: Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: Recall = TP/(TP+FN), with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. It is used in binary classification.

    Recall is important when testing for cancer because it's used to find all of the true positives. A false positive (FP) reflects a positive prediction that is actually negative in the data. It is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.
  • Precision: Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: Precision = TP/(TP+FP), with values ranging from zero (0) to one (1), and is used in binary classification. Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
  • AUC: The area under the curve (AUC) metric is used to compare and evaluate binary classification by algorithms that return probabilities, such as logistic regression. To map the probabilities into classifications, these are compared against a threshold value.
    The relevant curve is the receiver operating characteristic curve (ROC curve). The ROC curve plots the true positive rate (TPR) of predictions (or recall) against the false positive rate (FPR) as a function of the threshold value, above which a prediction is considered positive. Increasing the threshold results in fewer false positives, but more false negatives.
    AUC is the area under this ROC curve. Therefore, AUC provides an aggregated measure of the model performance across all possible classification thresholds. AUC scores vary between 0 and 1. A score of 1 indicates perfect accuracy, and a score of one half (0.5) indicates that the prediction is not better than a random classifier.
  • Accuracy: The ratio of the number of correctly classified items to the total number of (correctly and incorrectly) classified items. It is used for both binary and multiclass classification. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates perfect inaccuracy.
  • BalancedAccuracy: BalancedAccuracy is a metric that measures the ratio of accurate predictions to all predictions. This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is used in both binary and multiclass classification and is defined as follows: 0.5*((TP/P)+(TN/N)), with values ranging from 0 to 1. BalancedAccuracy gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.

Multiclass Classification:

  • F1macro: The F1macro score applies F1 scoring to multiclass classification problems. It does this by calculating the precision and recall, and then taking their harmonic mean to calculate the F1 score for each class. Lastly, the F1macro averages the individual scores to obtain the F1macro score. F1macroscores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
  • PrecisionMacro: The precision macro computes precision for multiclass classification problems. It does this by calculating precision for each class and averaging scores to obtain precision for several classes. PrecisionMacro scores range from zero (0) to one (1). Higher scores reflect the model's ability to predict true positives (TP) out of all of the positives that it identifies, averaged across multiple classes.
  • Accuracy: The ratio of the number of correctly classified items to the total number of (correctly and incorrectly) classified items. It is used for both binary and multiclass classification. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates perfect inaccuracy.
  • BalancedAccuracy: BalancedAccuracy is a metric that measures the ratio of accurate predictions to all predictions. This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is used in both binary and multiclass classification and is defined as follows: 0.5*((TP/P)+(TN/N)), with values ranging from 0 to 1. BalancedAccuracy gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
  • LogLoss: Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. It is used in both binary and multiclass classification and in neural nets. It is also the cost function for logistic regression. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
  • RecallMacro: The RecallMacro computes recall for multiclass classification problems by calculating recall for each class and averaging scores to obtain recall for several classes. RecallMacro scores range from 0 to 1. Higher scores reflect the model's ability to predict true positives (TP) in a dataset, whereas a true positive reflects a positive prediction that is also an actual positive value in the data. It is often insufficient to measure only recall, because predicting every output as a true positive will yield a perfect recall score.