waf machine-learning-policy

How an anomaly detection model is built?

FortiWeb uses machine learning model to analyze the parameters in your domain and decide whether the value of the parameter is legitimate or not. The machine learning model is built upon vast amount of parameter value samples collected from the real requests to the domain.

When a sample is collected, the system generalized it into a pattern. For example, “abcd_123@abc.com” and “abcdefgecdf_12345678@efg.com” will both be generalized to the pattern “A_N@A.A”. The anomaly detection model is built based on the patterns, not the raw samples.

FortiWeb analyzes the characteristics of the patterns and builds an initial model when 400 samples are collected. The system runs the initial model to detect anomalies, while it keeps collecting more samples to refine it.

Once the number of samples accumulates to 1200, the system will evaluate whether the patterns vary largely since the initial model is built:

If there are very few patterns generalized, it indicates the patterns are stable. The system will switch the initial model to a standard model.
If a lot of new patterns keeps coming in, the system will continue collecting more samples to cover as much patterns as possible. It won't switch to standard model until the patterns become stable.

The standard model is much more reliable and accurate compared with the initial model. However, your domains may change as new URLs are added and existing parameters provide new functions. This means the mathematical model of the same parameter might be different from what FortiWeb originally observed. To keep the machine learning model up to date, FortiWeb continues collecting new samples to update it, where the outdated patterns are discarded and new patterns are introduced.

To use this command, your administrator account’s access control profile must have either w or rw permission to the wafgrp area. For details, see Permissions.

Syntax

config waf machine-learning-policy

edit <machine-learning-policy_id>

set start-min-count <start-min-count _int>

set renovate-short-time <renovate-short-time_int>

set waf machine-learning-policy

set switch-min-count <switch-min-count_int>

set switch-percent <switch-percent_int>

set sliding-win-time <sliding-win-time_int>

set sub-window-size <sub-window-size_int>

set waf machine-learning-policy

set denoise-percent <denoise-percent_int>

set denoise-threshold <denoise-threshold_int>

set sample-limit-by-ip <sample-limit-by-ip_int>

set svm-model {xss | sql-injection | code-injection | command-injection | lfi-rfi | common-injection | remote-exploits}

set svm-type {standard | extended}

set anomaly-detection-threshold <anomaly-detection-threshold_int>

set waf machine-learning-policy

set action-anomaly {alert | alert_deny | block-period}

set block-period-anomaly <block-period_int>

set severity-definitely {High | Info | Low | Medium}

set trigger-definitely <policy_name>

set status {enable | disable}

set ip-expire-intval <int>

set ip-expire-cnts <int>

set ip-argcount-limit {enable | disable}

set ip-list-type {Trust | Black}

set url-replacer-policy <policy_name>

set threat-model {enable | disable}

set parameters-limit-per-conn {enable | disable}

set anomaly-detection-threshold <anomaly-detection-threshold_int>

config allow-domain-name

edit <allow-domain-name_id>

set domain-name <domain-name_str>

set domain-index <domain-index_id>

set hmm-probability-sample-length-check {enable | disable}

set sample-length-threshold <int>

set hmm-probability-threshold <int>

set character-set {AUTO | ISO-8859-1 | ISO-8859-2 | ISO-8859-3 | ISO-8859-4 | ISO-8859-5 | ISO-8859-6 | ISO-8859-7 | ISO-8859-8 | ISO-8859-9 | ISO-8859-10 | ISO-8859-15 | GB2312 | BIG5 | ISO-2022-JP | ISO-2022-JP-2 | Shift-JIS | ISO-2022-KR | UTF-8}

end

config source-ip-list

edit <source-ip-list_id>

set <ip>

end

Variable	Description	Default
<machine-learning-policy_id>	Enter the ID of the machine learning policy. It's the number displayed in the "#" column of the machine learning policy table on the Machine Learning Policy page. The valid range is 0–65535.	`No default`
start-min-count <start-min-count _int>	An initial model will be built if the sample count reaches `start-min-count`.	`400`
renovate-short-time <renovate-short-time_int>	The system keeps updating the initial model. `renovate-short-time` defines how frequently FortiWeb updates the model if new patterns keep coming in. The valid range is 15 to 1440.	15 (minutes)
renovate-long-time <renovate-long-time_int>	`renovate-long-time` defines how frequently FortiWeb updates the initial model even if no new pattern is generalized out of the samples collected in the past hours. For example, assuming you set the value to 8 (hours), and in the past 8 hours there isn't any new pattern, FortiWeb will update the model every 8 hours anyway. The valid range is 8 to 720.	8 (hours)
switch-min-count <switch-min-count_int>	When the number of samples reaches `switch-min-count`, FortiWeb will evaluate whether to build a standard model. The valid range is 800 to 3000.	`1200`
switch-percent <switch-percent_int>	`switch-percent` = the number of generalized patterns / the number of raw samples * 100 (%) When the `switch-percent` is smaller than the value you set, FortiWeb switches the initial model to the standard model. The valid range is 2 to 20.	`5(%)`
sliding-win-time <sliding-win-time_int>	After the standard model is built, FortiWeb keeps updating it according to the newest samples so that the model can be up to date even when your domain changes, such as when new URLs are added and existing parameters provide new functions. `sliding-win-time` defines how frequently FortiWeb updates the standard model. The valid range is 15-1440 in minutes.	15 (minutes)
sub-window-size <sub-window-size_int>	If there isn't any new pattern generalized during the `sliding-win-time`, the system will not update the standard model until the number of samples reaches the `sub-window-size`. The `sub-window-size` can be set as 50 or 100.	50
sub-window-count <sub-window-count_int>	Every time the standard model is updated, FortiWeb counts it as one `sub-window-count`. If a certain times of `sub-window-count` have passed and there isn't any sample coming in for a pattern, FortiWeb considers this pattern outdated, and will discard it. The `sub-window-count` can be set as 20, 40, or 80. For example, assuming the `sub-window-count` is 20, then FortiWeb will discard a pattern if there isn't any sample collected for it after the model has been updated for 20 times consecutively.	40
denoise-percent <denoise-percent_int>	It's important to reduce the noisy samples in order to build an accurate model. During the sample collecting period, the system ranks all the samples by their probabilities. The ones with the lowest probabilities will be selected as noisy reduction samples, and will be filtered further with `denoise-threshold` to determine whether it is a noise. For example, if you set `denoise-percent` to 3, then the 3% samples with the lowest probabilities will be selected as noisy reduction samples. The valid range is 1 to 10.	3 (%)
denoise-threshold <denoise-threshold_int>	The system uses the following formula to determine whether the noisy reduction samples are indeed noises: The probability of the sample > μ + `denoise-threshold` * σ. μ is the average probabilities of the noisy samples. σ is the denoise standard deviation. Assume there is a circle with most of the samples crowded in the center, and several samples scattered around the edge of the circle. If the probability of the sample is larger than the value of "μ + the strictness level * σ", it means this sample is scattered far away from the center cluster. It indicates this sample might be an anomaly, i.e. a noise. If you set the `denoise-threshold` larger, it means the system tolerates a longer distance that a sample is scattered from the center cluster. In this way, less samples will be treated as noises. If you want to identify more samples as noises, set the `denoise-threshold` smaller. The valid range is 1 to 10.	2
threat-model {enable \| disable}	Enable to scan anomalies to verify whether they are attacks. It provides a method to check whether an anomaly is a real attack by the trained Support Vector Machine Model.	`enable`
svm-model {xss \| sql-injection \| code-injection \| command-injection \| lfi-rfi \| common-injection \| remote-exploits}	Enable or disable threat models for different types of threats such as cross-site scripting, SQL injection and code injection. Currently, seven trained Support Vector Machine Model are provided for seven attack types.	`enable`
svm-type {standard \| extended}	If `standard` is selected, the system automatically disables the svm models which can easily trigger false positives. If `extended` is selected, the system enables all svm models.	`standard`
anomaly-detection-threshold <anomaly-detection-threshold_int>	The value of the anomaly-detection-threshold ranges from 1 to 10. The system uses the following formula to calculate the anomaly threshold: *The probability of the anomaly > μ + the strictness level σ** If the probability of the sample is larger than the value of "μ + the strictness level * σ", this sample will be identified as anomaly. μ and σ are calculated based on the probabilities of all the samples collected during the sample collection period, where μ is the average value of all the parameters' probabilities, σ is the standard deviation. They are fixed values. So, the value of "μ + the strictness level * σ" varies with the strictness level you set. The smaller the value of the strictness level is, the more strict the anomaly detection model will be. This option sets a global value for all the parameters. If you want to adjust the strictness level for a specific parameter, See Manage anomaly-detecting settings.	`0.1`
parameters-limit-per-conn {enable \| disable}	Enable to avoid collecting samples solely for the parameters in the same connection. The anomaly detection will be more effective if the system builds machine learning models for parameters diversely distributed in different connections.	`enable`
action-anomaly {alert \| alert_deny \| block-period}	Choose the action FortiWeb takes when definite attack is verified. `alert`—Accepts the connection and generates an alert email and/or log message. `alert_deny`—Blocks the request (or resets the connection) and generates an alert and/or log message. `block-period`—Blocks the request for a certain period of time.	`alert_deny`
block-period-anomaly <block-period_int>	Enter the number of seconds that you want to block the requests. The valid range is 1–3,600 seconds. This option only takes effect when you choose Period Block in Action.	`600`
severity-definitely {High \| Info \| Low \| Medium}	Select the severity level for this anomaly type. The severity level will be displayed in the alert email and/or log message.	`High`
trigger-definitely <policy_name>	Select a trigger policy that you have set in Log&Report > Log Policy > Trigger Policy. If definite anomaly is detected, it will trigger the system to send email and/or log messages according to the trigger policy.	No default.
status {enable \| disable}	Enable to change the status to Running, while disable to change the status to Stopped.	`enable`
url-replacer-policy <policy_name>	Select the name of the URL Replacer Policy that you have created in Machine Learning Templates. If web applications have dynamic URLs or unusual parameter styles, you must adapt URL Replacer Policy to recognize them.	No default.
trigger-potential <policy_name>	Select a trigger policy that you have set in Log&Report > Log Policy > Trigger Policy. If potential anomaly is detected, it will trigger the system to send email and/or log messages according to the trigger policy.
<allow-domain-name_id>	Enter the ID of the policy. The valid range is 1–65,535.	No default.
ip-list-type {Trust \| Black}	Allow or deny sample collection from the Source IP list.	`Trust`
domain-name <domain-name_str>	Add full domain name or use wildcard '*' to cover multiple domains under one profile.	No default.
domain-index <domain-index_id>	The number automatically assigned by the system when the domain name is created.	No default.
hmm-probability-sample-length-check {enable \| disable}	Enable to check whether the parameter value is in unexpected length or of high anomaly probability.	disable
sample-length-threshold <int>	If the length of the parameter value is larger than the specified threshold, the system will not send it to SVM model for further validation. Instead, it will be directly treated as an anomaly. The valid range is 0-1,024. 0 means not applicable.	0
hmm-probability-threshold <int>	If the anomaly probability of the parameter value is larger than the specified threshold, the system will not send it to SVM model for further validation. Instead, it will be directly treated as an anomaly. The valid range is 0-2,000. 0 means not applicable. If you are not sure how to set a proper probability value, there are two places where you can refer: In Parameter View, beside the Strictness Level for Anomaly option, there is a Test Sample button. Click it and enter a parameter value to check its probability. Repeat the tests with different values until you get an idea on a reasonable probability threshold. In Attack Log, find an Anomaly Detection attack. Click it to view the log details. You will find its probability.	0
character-set {AUTO \| ISO-8859-1 \| ISO-8859-2 \| ISO-8859-3 \| ISO-8859-4 \| ISO-8859-5 \| ISO-8859-6 \| ISO-8859-7 \| ISO-8859-8 \| ISO-8859-9 \| ISO-8859-10 \| ISO-8859-15 \| GB2312 \| BIG5 \| ISO-2022-JP \| ISO-2022-JP-2 \| Shift-JIS \| ISO-2022-KR \| UTF-8}	The corresponding character code when manually setting the domain.	No default.
<source-ip-list_id>	Enter the ID of the source IP. The valid range is 1–9,223,372,036,854,775,807	No default.
<ip>	Enter the IP range for the source IP list.	No default.
ip-expire-intval <int> ip-expire-cnts <int>	An parameter is in unconfirmed status initially, and it will be set to confirmed if the parameter is contained in the requests from a certain number of different source IPs within the given time. Otherwise, the parameter will be discarded. `ip-expire-cnts` defines the "the number of different source IPs", while the `ip-expire-intval` defines the given time period. The valid range for `ip-expire-intval` is 1-24 in hours, and the default value is 4. The valid range for `ip-expire-cnts` is 1-5, and the default value is 3.	4/3
ip-argcount-limit {enable \| disable}	Enable it so that each source IP can create at most 20 new arguments in every 30 minutes.	disable
sample-limit-by-ip <sample-limit-by-ip_int>	The limitation number of samples collected from each IP. The valid range is 0–5000.	`30`