Fortinet white logo
Fortinet white logo

Administration Guide

Configuring machine-learning URL replacer policy

Configuring machine-learning URL replacer policy

This section discusses how to configure machine-learning URL replacer policy, which is required when your application uses dynamic URLs and unusual parameters. This is not very common, and it's not required in most cases.

The URL replacer policy can be referenced in ML Based API Protection and ML Based Anomaly Dection.

Configure a URL replacer rule

URL replacer rules enable the machine-learning module to adapt to dynamic URLs and unusual parameters.

When web applications have dynamic URLs or unusual parameter styles, you must adapt the URL Replacer Rule to recognize them.

By default, machine learning assumes that your web applications use the most common URL structure:

As seen above, most commonly used URLs share the following characteristics:

  • All parameters follow a question mark (?). They do not follow a hash (#) or any other separator character.
  • If there are multiple name-value pairs, each pair is separated by an ampersand &. They are not separated by a semi-colon (;) or any other separator character.
  • All paths before the question mark (?) are static—they do not change based upon input, blending the path with parameters (sometimes called a dynamic URL).

For example, the page at

/app/main

always has that same path. After you log in, the page’s URL does not become

/app/marco/main

or

/app#deepa

For another example, the URL does not dynamically reflect the inventory, such as:

/app/sprockets/widget1024894

Some web applications, however, embed parameters within the path structure of a URL, or use unusual or non-uniform parameter separator characters. If you do not configure URL replacers to handle such variations, it can cause the system to gather machine learning data incorrectly, which can lead to the following consequences:

  • Machine-learning reports do not contain the correct URL structure.
  • URL/API path- or parameter-learning is endless.
  • Parameter data is incomplete, despite the fact that the FortiWeb appliance has seen traffic containing the parameter.

For example, with Microsoft Outlook Web App (OWA), the user’s login name could be embedded within the path structure of the URL, such as:

/owa/tom/index.html

/owa/mary/index.html

instead of suffixed as a parameter, such as:

/owa/index.html?username=tom

/owa/index.html?username=mary

Machine learning will continue to create new URLs as new users are added to OWA. It will also expend extra resources learning about URLs and parameters that are actually the same. Additionally, machine learning may not be able to fully learn the application structure because each user may not request the same URLs.

To address this issue, you must create a URL Replacer Rule that recognizes the user name within the OWA URL as if it were a standard, suffixed parameter value so that machine learning can function properly.

To create a URL Replacer Rule:

  1. Click Machine Learning > Machine Learning Templates.
  2. Click the URL Replacer Rule tab.
  3. Click Create New.
  4. Configure the parameters as described in the table below.
  5. Click OK when done.
Parameters Function
Name

Specify a unique name that can be referenced by other parts of the configuration.

Note: The name can be up to 63 characters long with no space or special character.

Type

Select either of the following:

  • Predefined—Use one of the predefined URL replacers which can be selected from the Application Type below.
  • Custom-Defined—Define your own URL replacer by configuring the URL Path, New URL, Param Change, and New Param fields below.
Application Type

If you have selected Predefined in the Type field above, then you must click the down arrow and select either of the following from the list menu:

  • JSP—Use the URL replacer designed for Java server pages (JSP) web applications, where parameters are often separated by semi-colon (;).
  • OWA 2003— Use the URL replacer designed for default URLs in Microsoft Outlook Web App (OWA), where user name and directory parameters are often embedded within the URL, as illustrated below:

(^/public/)(.*)

(^/exchange/)([^/]+)/*(([^/]+)/(.*))*

Note: These two application types are predefined URL interpreter plug-ins used by popular web applications.

Custom-Defined

If you have selected Custom-Defined in the Type field above, then you must populate the following fields:

URL Path

Enter a regular expression, such as (^/[^/]+)/(.*), matching all and only the URLs to which the URL replacer should apply. The URL path can be up to 256 characters long.

The pattern does not require a backslash (/). However, it must at least match URLs that begin with a backslash as they appear in the HTTP header, such as /index.html. Do not include the domain name, such as www.example.com.

To test the regular expression against a sample text, click the >> (Test) icon. This opens the Regular Expression Validator dialog where you can fine-tune the expression.

Note: If this URL replacer is to be used sequentially in a set of URL replacers, instead of being mutually exclusive, this regular expression must match the URL produced by the preceding interpreter rather than the original URL from the request.

New URL

Enter either a literal URL, such as /index.html, or a regular expression with a back-reference (such as $1) defining how the URL will be interpreted. The new URL cab be up to 256 characters long.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer, and must not refer to capture groups in other URL replacers.

Param Change

Enter either the parameter’s literal value, such as user1, or a back-reference (such as $0) defining how the value will be interpreted.

New Param

Type either the parameter’s literal name, such as username, or a back-reference (such as $2) defining how the parameter’s name will be interpreted in the auto-learning report. You can use up to 256 characters.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. They must not refer to capture groups in other URL replacers.

Example

Let's suppose param1 is accessible behind multiple dynamic URLs:

/sales/car/XXX/?param1=<value>

where XXX path can take multiple dynamic values of a model car.

Then the URL Replacer rule would be set as follows:

URL Path (/car/)([^/]+)/(.*)
New URL $0$2
Param Change $1
New Param model

In this example, the machine learning model needs to track "param1" just after the "XXX" dynamic path:

/sales/car/XXX/?param1=<value>

Let' put a position number on each object before and after the dynamic path XXX:

/car is on position 0 (just before the dynamic path XXX)

/XXX is on position 1 (it is the dynamic path XXX)

/?param1=<value> is on position 2 (it is it the parameter that the machine learning model will track after the dymamic path XXX)

So, for the URL path (/car/)([^/]+)/(.*), the machine learning model will consider /car as position 0;

For the new URL $0$2, the machine learning model will consider a new URL "/car/?param1=<value>" being built from position 0 "/car" followed by position 2 "/?param1=<value>".

For the Param Change "$1", the machine learning model will create a new "dummy" parameter regarding XXX's dynamic value found in position 1 "/XXX".

For the New Param "model", this "dummy" param will be called "model".

Configuring a URL replacer policy

In order to use URL Replacer Rules with a machine-learning policy, you must group URL replacer rules into sets, which form URL replacer policies.

The sets can be mutually exclusive, where a set contains expressions for all possible URL structures, but only one of the URL replacer rules will match a given request’s URL.

They also can be sequential, where a set contains expressions to interpret multiple parameters in a single given URL; each interpreter’s URL input is the URL output of the preceding interpreter, and they each parse the URL until all parameters have been extracted; the sequential order of URL replacer rules is determined by the URL replacer rule’s priority in the set.

To configure a URL replacer policy:

  1. Click Machine Learning > Machine Learning Templates.
  2. Click the URL Replacer Policy tab.
  3. Click Create New .
  4. In Name, type a name that can be referenced by other parts of the configuration. Note: The name can be up to 63 characters long, with no space or special characters.
  5. Click OK.
  6. Click Create New, and select the URL replacer rule to be grouped in the URL replacer policy.
  7. Click OK.

Note: You can select URL replacer policy in one or more machine-leaning policies including Anomaly Detection and API Protection policies.

Configuring machine-learning URL replacer policy

Configuring machine-learning URL replacer policy

This section discusses how to configure machine-learning URL replacer policy, which is required when your application uses dynamic URLs and unusual parameters. This is not very common, and it's not required in most cases.

The URL replacer policy can be referenced in ML Based API Protection and ML Based Anomaly Dection.

Configure a URL replacer rule

URL replacer rules enable the machine-learning module to adapt to dynamic URLs and unusual parameters.

When web applications have dynamic URLs or unusual parameter styles, you must adapt the URL Replacer Rule to recognize them.

By default, machine learning assumes that your web applications use the most common URL structure:

As seen above, most commonly used URLs share the following characteristics:

  • All parameters follow a question mark (?). They do not follow a hash (#) or any other separator character.
  • If there are multiple name-value pairs, each pair is separated by an ampersand &. They are not separated by a semi-colon (;) or any other separator character.
  • All paths before the question mark (?) are static—they do not change based upon input, blending the path with parameters (sometimes called a dynamic URL).

For example, the page at

/app/main

always has that same path. After you log in, the page’s URL does not become

/app/marco/main

or

/app#deepa

For another example, the URL does not dynamically reflect the inventory, such as:

/app/sprockets/widget1024894

Some web applications, however, embed parameters within the path structure of a URL, or use unusual or non-uniform parameter separator characters. If you do not configure URL replacers to handle such variations, it can cause the system to gather machine learning data incorrectly, which can lead to the following consequences:

  • Machine-learning reports do not contain the correct URL structure.
  • URL/API path- or parameter-learning is endless.
  • Parameter data is incomplete, despite the fact that the FortiWeb appliance has seen traffic containing the parameter.

For example, with Microsoft Outlook Web App (OWA), the user’s login name could be embedded within the path structure of the URL, such as:

/owa/tom/index.html

/owa/mary/index.html

instead of suffixed as a parameter, such as:

/owa/index.html?username=tom

/owa/index.html?username=mary

Machine learning will continue to create new URLs as new users are added to OWA. It will also expend extra resources learning about URLs and parameters that are actually the same. Additionally, machine learning may not be able to fully learn the application structure because each user may not request the same URLs.

To address this issue, you must create a URL Replacer Rule that recognizes the user name within the OWA URL as if it were a standard, suffixed parameter value so that machine learning can function properly.

To create a URL Replacer Rule:

  1. Click Machine Learning > Machine Learning Templates.
  2. Click the URL Replacer Rule tab.
  3. Click Create New.
  4. Configure the parameters as described in the table below.
  5. Click OK when done.
Parameters Function
Name

Specify a unique name that can be referenced by other parts of the configuration.

Note: The name can be up to 63 characters long with no space or special character.

Type

Select either of the following:

  • Predefined—Use one of the predefined URL replacers which can be selected from the Application Type below.
  • Custom-Defined—Define your own URL replacer by configuring the URL Path, New URL, Param Change, and New Param fields below.
Application Type

If you have selected Predefined in the Type field above, then you must click the down arrow and select either of the following from the list menu:

  • JSP—Use the URL replacer designed for Java server pages (JSP) web applications, where parameters are often separated by semi-colon (;).
  • OWA 2003— Use the URL replacer designed for default URLs in Microsoft Outlook Web App (OWA), where user name and directory parameters are often embedded within the URL, as illustrated below:

(^/public/)(.*)

(^/exchange/)([^/]+)/*(([^/]+)/(.*))*

Note: These two application types are predefined URL interpreter plug-ins used by popular web applications.

Custom-Defined

If you have selected Custom-Defined in the Type field above, then you must populate the following fields:

URL Path

Enter a regular expression, such as (^/[^/]+)/(.*), matching all and only the URLs to which the URL replacer should apply. The URL path can be up to 256 characters long.

The pattern does not require a backslash (/). However, it must at least match URLs that begin with a backslash as they appear in the HTTP header, such as /index.html. Do not include the domain name, such as www.example.com.

To test the regular expression against a sample text, click the >> (Test) icon. This opens the Regular Expression Validator dialog where you can fine-tune the expression.

Note: If this URL replacer is to be used sequentially in a set of URL replacers, instead of being mutually exclusive, this regular expression must match the URL produced by the preceding interpreter rather than the original URL from the request.

New URL

Enter either a literal URL, such as /index.html, or a regular expression with a back-reference (such as $1) defining how the URL will be interpreted. The new URL cab be up to 256 characters long.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer, and must not refer to capture groups in other URL replacers.

Param Change

Enter either the parameter’s literal value, such as user1, or a back-reference (such as $0) defining how the value will be interpreted.

New Param

Type either the parameter’s literal name, such as username, or a back-reference (such as $2) defining how the parameter’s name will be interpreted in the auto-learning report. You can use up to 256 characters.

Note: Back-references can only refer to capture groups (parts of the expression surrounded with parentheses) within the same URL replacer. They must not refer to capture groups in other URL replacers.

Example

Let's suppose param1 is accessible behind multiple dynamic URLs:

/sales/car/XXX/?param1=<value>

where XXX path can take multiple dynamic values of a model car.

Then the URL Replacer rule would be set as follows:

URL Path (/car/)([^/]+)/(.*)
New URL $0$2
Param Change $1
New Param model

In this example, the machine learning model needs to track "param1" just after the "XXX" dynamic path:

/sales/car/XXX/?param1=<value>

Let' put a position number on each object before and after the dynamic path XXX:

/car is on position 0 (just before the dynamic path XXX)

/XXX is on position 1 (it is the dynamic path XXX)

/?param1=<value> is on position 2 (it is it the parameter that the machine learning model will track after the dymamic path XXX)

So, for the URL path (/car/)([^/]+)/(.*), the machine learning model will consider /car as position 0;

For the new URL $0$2, the machine learning model will consider a new URL "/car/?param1=<value>" being built from position 0 "/car" followed by position 2 "/?param1=<value>".

For the Param Change "$1", the machine learning model will create a new "dummy" parameter regarding XXX's dynamic value found in position 1 "/XXX".

For the New Param "model", this "dummy" param will be called "model".

Configuring a URL replacer policy

In order to use URL Replacer Rules with a machine-learning policy, you must group URL replacer rules into sets, which form URL replacer policies.

The sets can be mutually exclusive, where a set contains expressions for all possible URL structures, but only one of the URL replacer rules will match a given request’s URL.

They also can be sequential, where a set contains expressions to interpret multiple parameters in a single given URL; each interpreter’s URL input is the URL output of the preceding interpreter, and they each parse the URL until all parameters have been extracted; the sequential order of URL replacer rules is determined by the URL replacer rule’s priority in the set.

To configure a URL replacer policy:

  1. Click Machine Learning > Machine Learning Templates.
  2. Click the URL Replacer Policy tab.
  3. Click Create New .
  4. In Name, type a name that can be referenced by other parts of the configuration. Note: The name can be up to 63 characters long, with no space or special characters.
  5. Click OK.
  6. Click Create New, and select the URL replacer rule to be grouped in the URL replacer policy.
  7. Click OK.

Note: You can select URL replacer policy in one or more machine-leaning policies including Anomaly Detection and API Protection policies.