waf bot-detection-policy

Variable	Description	Default
`policy-id <server-policy-id>`	Associate this bot detection policy with the specified server policy.	No defalut
`model-status {enable \| disable}`	Enable or disable bot detection.	enable
advanced-mode {enable \| disable}	Enable or disable the advanced settings in the bot detection policy	disable
client-identification-method {IP \| IP-and-User-Agent \| Cookie}	The data collected in one sample should be from the same user. The system uses IP, IP and User-Agent, or Cookie to identify a user. IP: The traffic data in one sample should come from the same source IP. IP and User-Agent: The traffic data in one sample should come from the same source IP and User-Agent (the browser). Cookie: The traffic data in one sample should have the same cookie value.	IP-and-User-Agent
sampling-count <integer>	This controls how many samples should be collected during the sample collection period. More samples mean the model will be more accurate; but at the same time, it costs longer time to complete the sample collection. Not all traffic data will be collected as samples. The system abandons traffic data if it meets one of the following criteria: The system sends Javascript challenge to user clients before collecting samples from them. If a client doesn't pass the challenge, the system will not collect sample data from it. The traffic is from malicious IPs reported by the IP Intelligence feature, or is recognized as a bot by the system. The traffic is from Known Engines, such as Google and Bing. The system also skips the known engine traffic when executing bot detection. Using these criteria is to exclude malicious traffic and the traffic from known engines that act like a bot, thus to make sure the bot detection model is built upon valid data collected from regular users.	1000
sampling-count-per-client <integer>	This controls how many samples FortiWeb will collect from each client (user) in an hour. For example, if the value is set to 3, and a client generates 10 samples in an hour, the system only collects the first 3 samples from this client in an hour. If the client generates more samples in the second hour, the system continues collecting samples from this client until the sample count reaches 3. This option prevents the system from continuously collecting samples from one client, thus to avoid the interference of the bot traffic in the sampling stage.	3
sampling-time-per-vector <integer>	Each vector (also called sample) records a certain user's behaviors in a certain time range. This option defines how long the time range is. For example, if the Sample Time Per Vector is 5 minutes, the system will record a certain user's behaviors in 5 minutes and count it as one sample.	5
training-accuracy <userdef>	The training accuracy is calculated by this formula: *The number of the regular samples in the training sample set/the total number of training samples 100%**. As we have introduced in the Basic Concepts section, multiple models are built based on multiple parameter combinations in the SVM algorithm. The system uses each model to detect anomalies in the sample set, and calculates the training accuracy for each model. For example, if there are 100 training samples, and 90 of them are treated as regular samples by a model, then the training accuracy for this model is 90%. The default value for the training accuracy is 95%, which means only the models whose training accuracy equals to or higher than 95% will be selected as qualified models.	95%
cross-validation <userdef>	The system divides the training sample sets evenly into three parts, let's say, Part A, B and C. The system executes three rounds of bot detection: First, the system observes the samples in Part A and B to build up a mathematical model, then uses this model to detect anomalies in Part C. Then, the system observes the samples in Part B and C to build up a mathematical model, then uses this model to detect anomalies in Part A. At last, the system observes the samples in Part A and C to build up a mathematical model, then uses this model to detect anomalies in Part B. The cross-validation value is calculated by this formula: *The total number of the regular samples/the total number of samples 100%.** For example, if there are 100 samples, and 10 anomalies are detected in the three rounds, then the cross-validation value for this model is: (100-10)/100 * 100% = 90%. The default value for the training accuracy is 90%, which means only the models whose Cross-Validation Value equals to or higher than 90% will be selected as qualified models.	90%
`testing-accuracy <userdef>`	Three quarters of the samples are divided into training sample set, and one quarter of the samples are divided into testing sample set. The system uses the models built for the training sample set to detect anomalies in the testing sample set. If the training accuracy and testing accuracy for a model vary greatly, it may indicate the model is not invalid. The testing accuracy is calculated by this formula: *The number of the regular samples in the testing sample set/the number of the testing samples 100%.** For example, if there are 100 testing samples, and 95 of them are treated as regular samples by a model, then the testing accuracy for this model is 95%. The default value for the training accuracy is 95%, which means only the models whose testing accuracy equals to or higher than 95% will be selected as qualified models.	95%
selected-model {Strict \| Loose}	Multiple models are built during the model building stage. The system uses training accuracy, cross-validation value, and testing accuracy to select qualified models. The Model Type is used to select the one final model out of all the qualified models. If you configure the Model Type to Loose, the system chooses the model which has the highest training accuracy among all the qualified models. If you configure the Model Type to Strict, the system chooses the model which has the lowest training accuracy among all the qualified models. The Strict Model detects more anomalies, but there are chances that regular users are false positively detected as bots. The Moderate Model is comparatively loose. It's less likely to conduct false positive detection, but there are risks that real bots might be escaped from detection. There isn't a perfect option for every situation. Whichever model type you choose, you can always leverage the other commands to mitigate the side effects, for example, using `bot-confirmation enable` to avoid false positive detections.	loose
anomaly-count <integer>	If the system detects certain times of anomalies from a user, it takes actions such as sending alerting emails or blocking the traffic from this user. Anomaly Count controls how many times of anomalies are allowed for each user. For example, the Anomaly Count is set to 4, and the system has detected 3 anomalies in the last 6 vectors. If the 7th vector is detected again as an anomaly, the system will take actions. Please note that if no valid traffic is collected for the 7th vector (for example, the user leaves your application), the system will clear the anomaly count and the user information. If the user revisits your application, he/she will be treated as new users and the system starts anomaly counting afresh. Since this option allows certain times of anomalies from a user, it might be a good choice if you want to avoid false positive detections.	3
bot-confirmation {enable \| disable}	If the number of anomalies from a user has reached the Anomaly Count, the system executes Bot Confirmation before taking actions. The Bot Confirmation is to confirm if the user is indeed a bot. The system sends RBE (Real Browser Enforcement) JavaScript or CAPTCHA to the client to double check if it's a real bot.	enable
verification-method {Disable \| Real-Browser-Enforcement \| Captcha-Enforcement}	Disable: Do not execute browser verification. Real Browser Enforcement: The system sends a JavaScript to the client to verify whether it is a web browser. CAPTCHA Enforcement: The system requires clients to successfully fulfill a CAPTCHA request. It will triger the action policy if the traffic is not from web browser.	Real-Browser-Enforcement
validation-timeout <integer>	Enter the maximum amount of time (in seconds) that FortiWeb waits for results from the client for Bot Confirmation. The default value is 20. The valid range is 5–30.	20
max-attempt-times <integer>	The maximum number of the CAPTCHA enforcement validation attempts. If the client fails the validation for the specified time, the system will trigger the action policy. This is only available if the verification-method is set to CAPTCHA-Enforcement	3
mobile-verification-method {Disable \| Mobile-Token-Validation}	Disable: Disable the system to verify whether the sample traffic is from mobile devices. Mobile-Token-Validation: The system verifies the mobile token to confirm if the traffic is from mobile devices.	disable
auto-refresh {enable \| disable}	If this is enabled, FortiWeb detects if the current model is applicable. If not, FortiWeb will refresh the current model automatically.	enable
refresh-factor <userdef>	Auto Refresh Factor controls the timing to trigger the model refreshment when a certain number of false positive vectors are detected. FortiWeb makes statistics for the bot detection in the past 24 hours. It counts the number of the following vectors: All vectors in the past 24 hours (A), Anomaly vectors (B), and The anomaly vectors that are confirmed as bots (C) If *(B - C)/(A - C) > 1 - Auto Refresh Factor training accuracy, the model will be refreshed. (B - C) is the false positive vectors, and (A - C) is the regular vectors. (B - C)/(A - C) represents the false positive rate. (1 - Auto Refresh Factor * training accuracy) is an adjusted anomaly vector rate. You can consider it as an auto refresh threshold. If the false positive rate (B - C)/(A - C) becomes greater than the auto refresh threshold (1 - Auto Refresh Factor * training accuracy)*, the system determines the current model is not applicable and automatically refreshes the model. The following table calculates the value of the auto refresh threshold when the Auto Refresh Factor is set to 0-1 (assuming the training accuracy is the default value 95%). For example, if the Auto Refresh Factor is set to 0.8, the auto refresh threshold will be 1 - 0.8 95% = 0.24, which means the system automatically refreshes the model when the false positive rate is greater than 0.24 (e.g. 24 false positive vectors and 100 regular vectors). You can use this table to quickly decide a value for the Auto Refresh Factor that is suitable for your situation.	0.7
minimum-vector-number <integer>	As we mentioned above, the system decides whether to update the bot detection model based on the statistics in the past 24 hours. If very few vectors are detected in the past 24 hours, it may interfere the rightness of the model refreshment decision. Set a value for the Minimum Vector Number, so that the system won't update the model if the number of the vectors hasn't reached this value. If the value is set to 0, the system will use the value of the Sample Count as the Minimum Vector Number.	0
action {alert \| deny_no_log \| alert_deny \| block-period}	The action FortiWeb takes when a user client is confirmed as a bot: alert—Accepts the connection and generates an alert email and/or log message. deny_no_log—Blocks the request. No logs will be generated. alert_deny—Blocks the request (or resets the connection) and generates an alert and/or log message. block-period—Blocks the request for a certain period of time.	alert
block-period <integer>	Enter the number of seconds that you want to block the requests. The valid range is 1–3,600 seconds. This option only takes effect when you choose Period Block in Action.	600
severity {High \| Medium \| Low \| Info}	Select the severity level for this anomaly type. The severity level will be displayed in the alert email and/or log message.	High
trigger <trigger-policy-name>	Select a trigger policy. If an anomaly is detected, it will trigger the system to send email and/or log messages according to the trigger policy.	No default
<ip-address>	If specified, the system will collect sample data only from the these IP addresses.	No default
host <string>	The system collects samples from any IP address except the specified IP address or FQDN of a protected host.	No default
host-status {enable \| disable}	Enable or disable comparing the URLs to the `Host:` field in the HTTP header.	enable
url-type {plain \| regular}	Specify whether the Exception URLs must contain either: plain—The field is a string that the Exception URL must match exactly. regular—The field is a regular expression that defines a set of matching URLs.	No default
url-pattern <string>	Depending on the `url-type`, enter either: plain—The literal URL, such as `/index.php`, that the HTTP request must contain in order to match the rule. The URL must begin with a slash ( `/` ). regular—A regular expression, such as `^/*.php`, matching the URLs to which the rule should apply. The pattern does not require a slash ( `/` ), but it must match URLs that begin with a slash, such as `/index.cfm`. Do not include the domain name, such as `www.example.com`, which is configured separately in `[bot-detection-exception-list] <No.> host <string>`.	No default

Variable

Description

Default

policy-id <server-policy-id>

Associate this bot detection policy with the specified server policy.

No defalut

model-status {enable | disable}

Enable or disable bot detection.

enable

advanced-mode {enable | disable}

Enable or disable the advanced settings in the bot detection policy

disable

client-identification-method {IP | IP-and-User-Agent | Cookie}

The data collected in one sample should be from the same user. The system uses IP, IP and User-Agent, or Cookie to identify a user.

IP: The traffic data in one sample should come from the same source IP.

IP and User-Agent: The traffic data in one sample should come from the same source IP and User-Agent (the browser).

Cookie: The traffic data in one sample should have the same cookie value.

IP-and-User-Agent

sampling-count <integer>

This controls how many samples should be collected during the sample collection period.

More samples mean the model will be more accurate; but at the same time, it costs longer time to complete the sample collection.

Not all traffic data will be collected as samples. The system abandons traffic data if it meets one of the following criteria:

The system sends Javascript challenge to user clients before collecting samples from them. If a client doesn't pass the challenge, the system will not collect sample data from it.
The traffic is from malicious IPs reported by the IP Intelligence feature, or is recognized as a bot by the system.
The traffic is from Known Engines, such as Google and Bing. The system also skips the known engine traffic when executing bot detection.

Using these criteria is to exclude malicious traffic and the traffic from known engines that act like a bot, thus to make sure the bot detection model is built upon valid data collected from regular users.

1000

sampling-count-per-client <integer>

This controls how many samples FortiWeb will collect from each client (user) in an hour.

For example, if the value is set to 3, and a client generates 10 samples in an hour, the system only collects the first 3 samples from this client in an hour. If the client generates more samples in the second hour, the system continues collecting samples from this client until the sample count reaches 3.

This option prevents the system from continuously collecting samples from one client, thus to avoid the interference of the bot traffic in the sampling stage.

sampling-time-per-vector <integer>

Each vector (also called sample) records a certain user's behaviors in a certain time range. This option defines how long the time range is.

For example, if the Sample Time Per Vector is 5 minutes, the system will record a certain user's behaviors in 5 minutes and count it as one sample.

training-accuracy <userdef>

The training accuracy is calculated by this formula:
The number of the regular samples in the training sample set/the total number of training samples * 100%.

As we have introduced in the Basic Concepts section, multiple models are built based on multiple parameter combinations in the SVM algorithm. The system uses each model to detect anomalies in the sample set, and calculates the training accuracy for each model.

For example, if there are 100 training samples, and 90 of them are treated as regular samples by a model, then the training accuracy for this model is 90%.

The default value for the training accuracy is 95%, which means only the models whose training accuracy equals to or higher than 95% will be selected as qualified models.

95%

cross-validation <userdef>

The system divides the training sample sets evenly into three parts, let's say, Part A, B and C. The system executes three rounds of bot detection:

First, the system observes the samples in Part A and B to build up a mathematical model, then uses this model to detect anomalies in Part C.
Then, the system observes the samples in Part B and C to build up a mathematical model, then uses this model to detect anomalies in Part A.
At last, the system observes the samples in Part A and C to build up a mathematical model, then uses this model to detect anomalies in Part B.

The cross-validation value is calculated by this formula:
The total number of the regular samples/the total number of samples * 100%.

For example, if there are 100 samples, and 10 anomalies are detected in the three rounds, then the cross-validation value for this model is: (100-10)/100 * 100% = 90%.

The default value for the training accuracy is 90%, which means only the models whose Cross-Validation Value equals to or higher than 90% will be selected as qualified models.

90%

testing-accuracy <userdef>

Three quarters of the samples are divided into training sample set, and one quarter of the samples are divided into testing sample set. The system uses the models built for the training sample set to detect anomalies in the testing sample set. If the training accuracy and testing accuracy for a model vary greatly, it may indicate the model is not invalid.

The testing accuracy is calculated by this formula:

The number of the regular samples in the testing sample set/the number of the testing samples * 100%.

For example, if there are 100 testing samples, and 95 of them are treated as regular samples by a model, then the testing accuracy for this model is 95%.

The default value for the training accuracy is 95%, which means only the models whose testing accuracy equals to or higher than 95% will be selected as qualified models.

95%

selected-model {Strict | Loose}

Multiple models are built during the model building stage. The system uses training accuracy, cross-validation value, and testing accuracy to select qualified models.

The Model Type is used to select the one final model out of all the qualified models.

If you configure the Model Type to Loose, the system chooses the model which has the highest training accuracy among all the qualified models.
If you configure the Model Type to Strict, the system chooses the model which has the lowest training accuracy among all the qualified models.

The Strict Model detects more anomalies, but there are chances that regular users are false positively detected as bots.

The Moderate Model is comparatively loose. It's less likely to conduct false positive detection, but there are risks that real bots might be escaped from detection.

There isn't a perfect option for every situation. Whichever model type you choose, you can always leverage the other commands to mitigate the side effects, for example, using bot-confirmation enable to avoid false positive detections.

loose

anomaly-count <integer>

If the system detects certain times of anomalies from a user, it takes actions such as sending alerting emails or blocking the traffic from this user.

Anomaly Count controls how many times of anomalies are allowed for each user.

For example, the Anomaly Count is set to 4, and the system has detected 3 anomalies in the last 6 vectors. If the 7th vector is detected again as an anomaly, the system will take actions.

Please note that if no valid traffic is collected for the 7th vector (for example, the user leaves your application), the system will clear the anomaly count and the user information. If the user revisits your application, he/she will be treated as new users and the system starts anomaly counting afresh.

Since this option allows certain times of anomalies from a user, it might be a good choice if you want to avoid false positive detections.

bot-confirmation {enable | disable}

If the number of anomalies from a user has reached the Anomaly Count, the system executes Bot Confirmation before taking actions.

The Bot Confirmation is to confirm if the user is indeed a bot. The system sends RBE (Real Browser Enforcement) JavaScript or CAPTCHA to the client to double check if it's a real bot.

enable

verification-method {Disable | Real-Browser-Enforcement | Captcha-Enforcement}

Disable: Do not execute browser verification.

Real Browser Enforcement: The system sends a JavaScript to the client to verify whether it is a web browser.

CAPTCHA Enforcement: The system requires clients to successfully fulfill a CAPTCHA request.

It will triger the action policy if the traffic is not from web browser.

Real-Browser-Enforcement

validation-timeout <integer>

Enter the maximum amount of time (in seconds) that FortiWeb waits for results from the client for Bot Confirmation. The default value is 20. The valid range is 5–30.

max-attempt-times <integer>

The maximum number of the CAPTCHA enforcement validation attempts. If the client fails the validation for the specified time, the system will trigger the action policy.

This is only available if the verification-method is set to CAPTCHA-Enforcement

mobile-verification-method {Disable | Mobile-Token-Validation}

Disable: Disable the system to verify whether the sample traffic is from mobile devices.

Mobile-Token-Validation: The system verifies the mobile token to confirm if the traffic is from mobile devices.

disable

auto-refresh {enable | disable}

If this is enabled, FortiWeb detects if the current model is applicable. If not, FortiWeb will refresh the current model automatically.

enable

refresh-factor <userdef>

Auto Refresh Factor controls the timing to trigger the model refreshment when a certain number of false positive vectors are detected.

FortiWeb makes statistics for the bot detection in the past 24 hours. It counts the number of the following vectors:

All vectors in the past 24 hours (A),
Anomaly vectors (B), and
The anomaly vectors that are confirmed as bots (C)

If (B - C)/(A - C) > 1 - Auto Refresh Factor * training accuracy, the model will be refreshed.

(B - C) is the false positive vectors, and (A - C) is the regular vectors. (B - C)/(A - C) represents the false positive rate.
(1 - Auto Refresh Factor * training accuracy) is an adjusted anomaly vector rate. You can consider it as an auto refresh threshold.

If the false positive rate (B - C)/(A - C) becomes greater than the auto refresh threshold (1 - Auto Refresh Factor * training accuracy), the system determines the current model is not applicable and automatically refreshes the model.

The following table calculates the value of the auto refresh threshold when the Auto Refresh Factor is set to 0-1 (assuming the training accuracy is the default value 95%).

For example, if the Auto Refresh Factor is set to 0.8, the auto refresh threshold will be 1 - 0.8 * 95% = 0.24, which means the system automatically refreshes the model when the false positive rate is greater than 0.24 (e.g. 24 false positive vectors and 100 regular vectors).

You can use this table to quickly decide a value for the Auto Refresh Factor that is suitable for your situation.

0.7

minimum-vector-number <integer>

As we mentioned above, the system decides whether to update the bot detection model based on the statistics in the past 24 hours. If very few vectors are detected in the past 24 hours, it may interfere the rightness of the model refreshment decision.

Set a value for the Minimum Vector Number, so that the system won't update the model if the number of the vectors hasn't reached this value.

If the value is set to 0, the system will use the value of the Sample Count as the Minimum Vector Number.

action {alert | deny_no_log | alert_deny | block-period}

The action FortiWeb takes when a user client is confirmed as a bot:

alert—Accepts the connection and generates an alert email and/or log message.
deny_no_log—Blocks the request. No logs will be generated.
alert_deny—Blocks the request (or resets the connection) and generates an alert and/or log message.
block-period—Blocks the request for a certain period of time.

alert

block-period <integer>

Enter the number of seconds that you want to block the requests. The valid range is 1–3,600 seconds.

This option only takes effect when you choose Period Block in Action.

600

severity {High | Medium | Low | Info}

Select the severity level for this anomaly type. The severity level will be displayed in the alert email and/or log message.

High

trigger <trigger-policy-name>

Select a trigger policy. If an anomaly is detected, it will trigger the system to send email and/or log messages according to the trigger policy.

No default

<ip-address>

If specified, the system will collect sample data only from the these IP addresses.

No default

host <string>

The system collects samples from any IP address except the specified IP address or FQDN of a protected host.

No default

host-status {enable | disable}

Enable or disable comparing the URLs to the Host: field in the HTTP header.

enable

url-type {plain | regular}

Specify whether the Exception URLs must contain either:

plain—The field is a string that the Exception URL must match exactly.
regular—The field is a regular expression that defines a set of matching URLs.

No default

url-pattern <string>

Depending on the url-type, enter either:

plain—The literal URL, such as /index.php, that the HTTP request must contain in order to match the rule. The URL must begin with a slash ( / ).
regular—A regular expression, such as ^/*.php, matching the URLs to which the rule should apply. The pattern does not require a slash ( / ), but it must match URLs that begin with a slash, such as /index.cfm.

Do not include the domain name, such as www.example.com, which is configured separately in [bot-detection-exception-list] <No.> host <string>.

No default

CLI Reference

waf bot-detection-policy

waf bot-detection-policy

Syntax

waf bot-detection-policy

Syntax