ML Based Anomaly Detection

The anomaly detection model of machine learning feature observes the URLs, parameters, and HTTP Method of HTTP and/or HTTPS sessions passing to your web servers. It builds mathematical models to detect abnormal traffic. To learn about whether a request is legitimate or a potential malicious attack attempt, it performs the following tasks:

Captures and collects inputs, such as URL parameters, to build a mathematical model of allowed access
Observes the HTTP method of the traffic
Matches anomalies against pre-trained threat models
Detects attacks

FortiWeb employs two layers of machine learning to detect malicious attacks. The first layer uses the Hidden Markov Model (HMM) and monitors access to the application and collects data to build a mathematical model behind every parameter and HTTP method. Once completed, it will verify every request against the model to determine whether it's an anomaly or not.

Once the first layer of machine learning triggers a request as an anomaly, FortiWeb will use the second layer of machine learning to verify whether it's a real attack or just a benign anomaly that should be ignored. To do so, FortiWeb includes pre-built trained threat models. Each represents a certain attack category, such as SQL Injection, Cross-site Scripting, and so on. Each threat model is already trained based on analysis of thousands of attack samples. Threat models are continuously updated using the FortiWeb Security Service. When new attack types are released, the FortiGuard team analyzes the new threats and re-trains the relevant threat model. The new threat model is then pushed to all customer installations in a way similar to how signatures are updated.

How an anomaly detection model is built?

FortiWeb uses machine learning model to analyze the parameters in your domain and decide whether the value of the parameter is legitimate or not. The machine learning model is built upon vast amount of parameter value samples collected from the real requests to the domain.

The traffic should meet all of the following conditions to be treated as a sample:

The response code of response packet must be 200 or 302;
The response content-type of response packet must be text or html;
The request packet must have parameter(s) in URL or body.

When a sample is collected, the system generalized it into a pattern. For example, “abcd_123@abc.com” and “abcdefgecdf_12345678@efg.com” will both be generalized to the pattern “A_N@A.A”. The anomaly detection model is built based on the patterns, not the raw samples.

FortiWeb analyzes the characteristics of the patterns and builds an initial model when 400 samples are collected. The system runs the initial model to detect anomalies, while it keeps collecting more samples to refine it.

Once the number of samples accumulates to 1200, the system will evaluate whether the patterns vary largely since the initial model is built:

If there are very few patterns generalized, it indicates the patterns are stable. The system will switch the initial model to a standard model.
If a lot of new patterns keeps coming in, the system will continue collecting more samples to cover as much patterns as possible. It won't switch to standard model until the patterns become stable.

The standard model is much more reliable and accurate compared with the initial model. However, your domains may change as new URLs are added and existing parameters provide new functions. This means the mathematical model of the same parameter might be different from what FortiWeb originally observed. To keep the machine learning model up to date, FortiWeb continues collecting new samples to update it, where the outdated patterns are discarded and new patterns are introduced.

Anomaly detection policy is part of a server policy. It is created on the Policy > Sever Policy page.

Anomaly detection must learn the charset for each domain before it can work properly. The charset can be learned automatically from the server's response or configured via CLI. All of the following conditions should be met for the learning to be successful:

The response code of response packet must be 200 or 302;
The response content-type of response packet must be text/html;
The request packet must have parameter(s) in the URL or body. See the following examples:
- Parameter in the URL:
  
  http://www.testdomain.com/autotest/test.html?testargument=2000
- Parameters in the body:
  
  POST /autotest/csh/mlarg3.php HTTP/1.1
  
  Connection: keep-alive
  
  Accept-Encoding: gzip, deflate
  
  Accept: */*
  
  User-Agent: python-requests/2.12.2
  
  Host: testmydomain
  
  Cookie: cookiesession1=3473FD0DAS38CIHAIRSOZ3D9RDVTB577;
  
  X-Forwarded-For: 2.2.2.2
  
  Content-Length: 15
  
  Content-Type: application/x-www-form-urlencoded
  
  myparameter=123

Notes: The content-type in the body should be "application/x-www-form-urlencoded". Other content-types such as "application/json" are not supported.

To create an Anomaly Detection policy:

Click Policy > Server Policy.
Select an existing server policy.
Please note that the machine learning policies can't be created during the server policy creation process. You should first create a server policy, then click its Edit button to create a machine learning policy.
Scroll down to the Machine Learning section at the bottom of the page, click the Anomaly Detection tab, then click Create. The New Machine Learning dialog opens.
Click the + (Add) sign after the Domain filed to add the desired domains, so that the system collects samples and builds up a machine learning model for the domains.
Select whether to trust or block the specified source IP addresses.
Click the + (Add) sign after the IP Range field to add IP/Range, so as to limit the system to collect data only (When IP List Type is Trust) or exclude data (When IP List Type is Block) from the specified IP range.
Click OK.

After it's completed, go back to Server Policy. Select the one which contains the anomaly detection policy you just created. You will see the following buttons in the Anomaly Detection tab.

Button	Function
View	Click to view and edit machine learning policies and their learning results. Note: You can also access the Machine Learning page by clicking Machine Learning, and then selecting a specific policy.
Start/Stop	Click to start/stop Machine Learning for the policy.
Retain	Click to restart machine learning for all URLs in the policy. Note: This will discard all existing learning results and then relearn all data.
Discard	Click to remove all learned URLs from the policy. Note: FortiWeb will not re-learn those URLs.
Export	Click to export all the data generated by the machine learning policy.
Import	Click to import the machine learning data from your local directory to FortiWeb. Note: The machine learning data generated in FortiWeb 6.0 cannot be imported in FortiWeb 6.0.1, and vice versa.

All anomaly detection policies that you have created will show up on the Web Protection > ML Based Anomaly Detection page, where you can configure or edit them to your preference.

To configure an anomaly detection policy:

Click Web Protection > ML Based Anomaly Detection .
Double-click the server policy that contains the desired anomaly detection policy (or highlight it and then click the Edit button on top of the page) to open it. The Edit Anomaly Detection Configuration page opens, which breaks down anomaly detection policy into several sections, each of which has various parameters you can use to configure the policy.
Follow the instructions in the following subsections to configure an anomaly detection policy.
Click OK when done.

Some of the machine learning configurations are available only in CLI, for example, the sample number of the initial and the standard models, how frequently the model is updated, etc. Please refer to config waf machine-learning-policy in FortiWeb CLI Reference.

Such settings are hidden in Web UI and default values for them are used. This is sufficient for most cases. We don't recommend to change the settings through CLI unless you know well the impact of the them on the machine learning model.

Sections & Parameters	Function
Anomaly Detection Settings
Strictness Level for Anomaly	The value of the strictness level ranges from 1 to 10. The system uses the following formula to calculate whether a sample is an anomaly: *The probability of the anomaly > μ + the strictness level σ** If the probability of the sample is larger than the value of "μ + the strictness level * σ", this sample will be identified as anomaly. μ and σ are calculated based on the probabilities of all the samples collected during the sample collection period, where μ is the average value of all the parameters' probabilities, σ is the standard deviation. They are fixed values. So, the value of "μ + the strictness level * σ" varies with the strictness level you set. The smaller the value of the strictness level is, the more strict the anomaly detection model will be. This options set a global value for all the parameters. If you want to adjust the strictness level for a specific parameter, See Manage anomaly-detecting settings.
Threat Models	The system scans anomalies to verify whether they are attacks. It provides a method to check whether an anomaly is a real attack by the trained Support Vector Machine Model. Click Edit to enable or disable threat models for different types of threats such as cross-site scripting, SQL injection and code injection. Currently, seven trained Support Vector Machine Model are provided for seven attack types.
Domain Settings
Create New	Add domains to let FortiWeb perform sample collection and intrusion detection on those domains. You can use wildcard * to represent multiple domains. Refer to Maximum number of ADOMs, policies, & server pools per appliance for the maximum domain number supported by the Machine Learning feature for your FortiWeb Model.
(View Domain)	View anomaly detection reports for that specific domain. The URLs and parameters in this domains are listed. See Viewing domain data
(Retain)	Retain the models of the corresponding domain. Note:Retaining deletes all existing learning results.
(Export)	Export the anomaly detection data of this domain.
Delete	Remove the selected domain(s). Note: This will remove all machine-learning results related to the domain(s) as well.
Import	Import the anomaly detection data from your local directory to FortiWeb
Action Settings
Action	All requests are scanned first by HMM and then by Threat model. Double click the cells in the Action Settings table to choose the action FortiWeb takes when attack is verified for each of the following situations: Alert—Accepts the connection and generates an alert email and/or log message. Alert & Deny—Blocks the request (or resets the connection) and generates an alert and/or log message. Period Block—Blocks the request for a certain period of time.
Block Period	Enter the number of seconds that you want to block the requests. The valid range is 1–3,600 seconds (1 hour). This option only takes effect when you choose Period Block in Action.
Severity	Select the severity level for this anomaly type. The severity level will be displayed in the alert email and/or log message.
Trigger Action	Select a trigger policy that you have set in Log&Report > Log Policy > Trigger Policy. If potential or definite anomaly or HTTP Method Violation is detected, it will trigger the system to send email and/or log messages according to the trigger policy.
Advanced Settings
Strictness Level for Anomaly	The value of the strictness level ranges from 1 to 10. The system uses the following formula to calculate whether a sample is an anomaly: *The probability of the anomaly > μ + the strictness level σ** If the probability of the sample is larger than the value of "μ + the strictness level * σ", this sample will be identified as anomaly. μ and σ are calculated based on the probabilities of all the samples collected during the sample collection period, where μ is the average value of all the parameters' probabilities, σ is the standard deviation. They are fixed values. So, the value of "μ + the strictness level * σ" varies with the strictness level you set. The smaller the value of the strictness level is, the more strict the anomaly detection model will be. This options set a global value for all the parameters. If you want to adjust the strictness level for a specific parameter, See Manage anomaly-detecting settings.
Threat Models	The system scans anomalies to verify whether they are attacks. It provides a method to check whether an anomaly is a real attack by the trained Support Vector Machine Model. Click Edit to enable or disable threat models for different types of threats such as cross-site scripting, SQL injection and code injection. Currently, seven trained Support Vector Machine Model are provided for seven attack types.

IP List Type and Source IP list

Add IP ranges in the Source IP list, then select Trust or Block to allow or disallow collecting traffic data samples from these IP addresses.

Trust: The system will collect samples only from the IP ranges in the Source IP list.
Block: The system will collect sample from any IP addresses except the ones in the Source IP list.

Whether selecting Trust or Block, if you leave the Source IP list blank, the system will collect traffic data samples from any IP addresses.

If you select Trust, then add IP ranges in the Source IP list, FortiWeb will collect traffic data samples only from the specified IP ranges.

URL Replacer Policy

Select the name of the URL Replacer Policy that you have created in Machine Learning Templates.

If web applications have dynamic URLs or unusual parameter styles, you must adapt URL Replacer Policy to recognize them.

If you have not created an URL Replacer Policy yet, you can leave this option empty for now, and then edit this policy later when the URL Replacer Policy is created. For more information on URL Replacer Policy, see Configure a URL replacer rule