ML Based Anomaly Detection
The anomaly detection model of machine learning feature observes the URLs, parameters, and HTTP Method of HTTP and/or HTTPS sessions passing to your web servers. It builds mathematical models to detect abnormal traffic. To learn about whether a request is legitimate or a potential malicious attack attempt, it performs the following tasks:
- Captures and collects inputs, such as URL parameters, to build a mathematical model of allowed access
- Observes the HTTP method of the traffic
- Matches anomalies against pre-trained threat models
- Detects attacks
FortiWeb employs two layers of machine learning to detect malicious attacks. The first layer uses the Hidden Markov Model (HMM) and monitors access to the application and collects data to build a mathematical model behind every parameter and HTTP method. Once completed, it will verify every request against the model to determine whether it's an anomaly or not.
Once the first layer of machine learning triggers a request as an anomaly, FortiWeb will use the second layer of machine learning to verify whether it's a real attack or just a benign anomaly that should be ignored. To do so, FortiWeb includes pre-built trained threat models. Each represents a certain attack category, such as SQL Injection, Cross-site Scripting, and so on. Each threat model is already trained based on analysis of thousands of attack samples. Threat models are continuously updated using the FortiWeb Security Service. When new attack types are released, the FortiGuard team analyzes the new threats and re-trains the relevant threat model. The new threat model is then pushed to all customer installations in a way similar to how signatures are updated.
How an anomaly detection model is built?
FortiWeb uses machine learning model to analyze the parameters in your domain and decide whether the value of the parameter is legitimate or not. The machine learning model is built upon vast amount of parameter value samples collected from the real requests to the domain.
The traffic should meet all of the following conditions to be treated as a sample:
-
The response code of response packet must be 200 or 302;
-
The response content-type of response packet must be text or html;
-
The request packet must have parameter(s) in URL or body.
When a sample is collected, the system generalized it into a pattern. For example, “abcd_123@abc.com” and “abcdefgecdf_12345678@efg.com” will both be generalized to the pattern “A_N@A.A”. The anomaly detection model is built based on the patterns, not the raw samples.
FortiWeb analyzes the characteristics of the patterns and builds an initial model when 400 samples are collected. The system runs the initial model to detect anomalies, while it keeps collecting more samples to refine it.
Once the number of samples accumulates to 1200, the system will evaluate whether the patterns vary largely since the initial model is built:
- If there are very few patterns generalized, it indicates the patterns are stable. The system will switch the initial model to a standard model.
- If a lot of new patterns keeps coming in, the system will continue collecting more samples to cover as much patterns as possible. It won't switch to standard model until the patterns become stable.
The standard model is much more reliable and accurate compared with the initial model. However, your domains may change as new URLs are added and existing parameters provide new functions. This means the mathematical model of the same parameter might be different from what FortiWeb originally observed. To keep the machine learning model up to date, FortiWeb continues collecting new samples to update it, where the outdated patterns are discarded and new patterns are introduced.
Anomaly detection policy is part of a server policy. It is created on the Policy > Sever Policy page.
Anomaly detection must learn the charset for each domain before it can work properly. The charset can be learned automatically from the server's response or configured via CLI. All of the following conditions should be met for the learning to be successful:
-
The response code of response packet must be 200 or 302;
-
The response content-type of response packet must be text/html;
-
The request packet must have parameter(s) in the URL or body. See the following examples:
-
Parameter in the URL:
http://www.testdomain.com/autotest/test.html?testargument=2000
-
Parameters in the body:
POST /autotest/csh/mlarg3.php HTTP/1.1
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.12.2
Host: testmydomain
Cookie: cookiesession1=3473FD0DAS38CIHAIRSOZ3D9RDVTB577;
X-Forwarded-For: 2.2.2.2
Content-Length: 15
Content-Type: application/x-www-form-urlencoded
myparameter=123
-
Notes: The content-type in the body should be "application/x-www-form-urlencoded". Other content-types such as "application/json" are not supported.
To create an Anomaly Detection policy:
- Click Policy > Server Policy.
- Select an existing server policy.
Please note that the machine learning policies can't be created during the server policy creation process. You should first create a server policy, then click its Edit button to create a machine learning policy. - Scroll down to the Machine Learning section at the bottom of the page, click the Anomaly Detection tab, then click Create. The New Machine Learning dialog opens.
- Click the + (Add) sign after the Domain filed to add the desired domains, so that the system collects samples and builds up a machine learning model for the domains.
- Select whether to trust or block the specified source IP addresses.
- Click the + (Add) sign after the IP Range field to add IP/Range, so as to limit the system to collect data only (When IP List Type is Trust) or exclude data (When IP List Type is Block) from the specified IP range.
- Click OK.
After it's completed, go back to Server Policy. Select the one which contains the anomaly detection policy you just created. You will see the following buttons in the Anomaly Detection tab.
Button | Function |
---|---|
View |
Click to view and edit machine learning policies and their learning results. Note: You can also access the Machine Learning page by clicking Machine Learning, and then selecting a specific policy. |
Start/Stop |
Click to start/stop Machine Learning for the policy. |
Retain |
Click to restart machine learning for all URLs in the policy. Note: This will discard all existing learning results and then relearn all data. |
Discard |
Click to remove all learned URLs from the policy. Note: FortiWeb will not re-learn those URLs. |
Export |
Click to export all the data generated by the machine learning policy. |
Import |
Click to import the machine learning data from your local directory to FortiWeb. Note: The machine learning data generated in FortiWeb 6.0 cannot be imported in FortiWeb 6.0.1, and vice versa. |
All anomaly detection policies that you have created will show up on the Web Protection > ML Based Anomaly Detection page, where you can configure or edit them to your preference.
To configure an anomaly detection policy:
- Click Web Protection > ML Based Anomaly Detection .
- Double-click the server policy that contains the desired anomaly detection policy (or highlight it and then click the Edit button on top of the page) to open it. The Edit Anomaly Detection Configuration page opens, which breaks down anomaly detection policy into several sections, each of which has various parameters you can use to configure the policy.
- Follow the instructions in the following subsections to configure an anomaly detection policy.
- Click OK when done.
Some of the machine learning configurations are available only in CLI, for example, the sample number of the initial and the standard models, how frequently the model is updated, etc. Please refer to Such settings are hidden in Web UI and default values for them are used. This is sufficient for most cases. We don't recommend to change the settings through CLI unless you know well the impact of the them on the machine learning model. |
Sections & Parameters | Function |
---|---|
Anomaly Detection Settings | |
Strictness Level for Anomaly |
The value of the strictness level ranges from 1 to 10. The system uses the following formula to calculate whether a sample is an anomaly: The probability of the anomaly > μ + the strictness level * σ If the probability of the sample is larger than the value of "μ + the strictness level * σ", this sample will be identified as anomaly. μ and σ are calculated based on the probabilities of all the samples collected during the sample collection period, where μ is the average value of all the parameters' probabilities, σ is the standard deviation. They are fixed values. So, the value of "μ + the strictness level * σ" varies with the strictness level you set. The smaller the value of the strictness level is, the more strict the anomaly detection model will be. This options set a global value for all the parameters. If you want to adjust the strictness level for a specific parameter, See Manage anomaly-detecting settings. |
Threat Models |
The system scans anomalies to verify whether they are attacks. It provides a method to check whether an anomaly is a real attack by the trained Support Vector Machine Model. Click Edit to enable or disable threat models for different types of threats such as cross-site scripting, SQL injection and code injection. Currently, seven trained Support Vector Machine Model are provided for seven attack types. |
Domain Settings |
|
Create New |
|
(View Domain) |
|
(Retain) |
Retain the models of the corresponding domain. Note:Retaining deletes all existing learning results. |
(Export) |
Export the anomaly detection data of this domain. |
Delete |
Remove the selected domain(s). Note: This will remove all machine-learning results related to the domain(s) as well. |
Import |
Import the anomaly detection data from your local directory to FortiWeb |
Action Settings | |
Action |
All requests are scanned first by HMM and then by Threat model. Double click the cells in the Action Settings table to choose the action FortiWeb takes when attack is verified for each of the following situations:
|
Block Period |
Enter the number of seconds that you want to block the requests. The valid range is 1–3,600 seconds (1 hour). This option only takes effect when you choose Period Block in Action. |
Severity |
Select the severity level for this anomaly type. The severity level will be displayed in the alert email and/or log message. |
Trigger Action |
Select a trigger policy that you have set in Log&Report > Log Policy > Trigger Policy. If potential or definite anomaly or HTTP Method Violation is detected, it will trigger the system to send email and/or log messages according to the trigger policy. |
Advanced Settings | |
Strictness Level for Anomaly |
The value of the strictness level ranges from 1 to 10. The system uses the following formula to calculate whether a sample is an anomaly: The probability of the anomaly > μ + the strictness level * σ If the probability of the sample is larger than the value of "μ + the strictness level * σ", this sample will be identified as anomaly. μ and σ are calculated based on the probabilities of all the samples collected during the sample collection period, where μ is the average value of all the parameters' probabilities, σ is the standard deviation. They are fixed values. So, the value of "μ + the strictness level * σ" varies with the strictness level you set. The smaller the value of the strictness level is, the more strict the anomaly detection model will be. This options set a global value for all the parameters. If you want to adjust the strictness level for a specific parameter, See Manage anomaly-detecting settings. |
Threat Models |
The system scans anomalies to verify whether they are attacks. It provides a method to check whether an anomaly is a real attack by the trained Support Vector Machine Model. Click Edit to enable or disable threat models for different types of threats such as cross-site scripting, SQL injection and code injection. Currently, seven trained Support Vector Machine Model are provided for seven attack types. |
IP List Type and Source IP list
Add IP ranges in the Source IP list, then select Trust or Block to allow or disallow collecting traffic data samples from these IP addresses.
- Trust: The system will collect samples only from the IP ranges in the Source IP list.
- Block: The system will collect sample from any IP addresses except the ones in the Source IP list.
Whether selecting Trust or Block, if you leave the Source IP list blank, the system will collect traffic data samples from any IP addresses.
If you select Trust, then add IP ranges in the Source IP list, FortiWeb will collect traffic data samples only from the specified IP ranges.
URL Replacer Policy
Select the name of the URL Replacer Policy that you have created in Machine Learning Templates.
If web applications have dynamic URLs or unusual parameter styles, you must adapt URL Replacer Policy to recognize them.
If you have not created an URL Replacer Policy yet, you can leave this option empty for now, and then edit this policy later when the URL Replacer Policy is created. For more information on URL Replacer Policy, see Configure a URL replacer rule