Fortinet white logo
Fortinet white logo

Cookbook

Web content filter

Web content filter

You can control access to web content by blocking web pages containing specific words or patterns. This helps to prevent access to pages with questionable material. You can specify words, phrases, patterns, wildcards and Perl regular expressions to match content on web pages. You can use multiple web content filter lists and select the best web content filter list for each Web Filter profile.

Pattern type

When you have created the Web Filter content list, you need to add web content patterns to it. There are two types of patterns: wildcard and regular expression.

Wildcard

Use the wildcard setting to block or exempt one word or text strings of up to 80 characters. You can also use wildcard symbols such as ? or * to represent one or more characters. For example, a wildcard expression forti*.com matches fortinet.com and forticare.com. The * represents any character appearing any number of times.

Regular expression

Use the regular expression setting to block or exempt patterns of Perl expressions which use some of the same symbols as wildcard expressions but for different purposes. In regular expressions, * represents the character before the symbol. For example, forti*.com matches fortiii.com but not fortinet.com or fortiice.com. In this case, the symbol * represents i appearing any number of times.

The maximum number of web content patterns in a list is 5000.

Content evaluation

The web content filter feature scans the content of every web page that is accepted by a security policy. The system administrator can specify banned words and phrases and attach a numerical value, or score, to the importance of those words and phrases. When the web content filter scan detects banned content, it adds the scores of banned words and phrases found on that page. If the sum is higher than a threshold set in the Web Filter profile, FortiGate blocks the page.

The default score for web content filter is 10 and the default threshold is 10. This means that by default, a web page is blocked by a single match.

Banned words or phrases are evaluated according to the following rules:

  • The score for each word or phrase is counted only once, even if that word or phrase appears many times in the web page.
  • The score for any word in a phrase without quotation marks is counted.
  • The score for a phrase in quotation marks is counted only if it appears exactly as written.

Sample of applying banned pattern rules

The following table is an example of how rules are applied to the contents of a web page. For example, a web page contains only this sentence:

The score for each word or phrase is counted only once, even if that word or phrase appears many times in the web page.

Banned pattern

Assigned score

Score added to the sum for the entire page

Threshold score

Comment

word

20

20

20

Appears twice but only counted once. Web page is blocked.

word phrase

20

40

20

Each word appears twice but only counted once giving a total score of 40. Web page is blocked.

word sentence

20

20

20

“word” appears twice, “sentence” does not appear, but since any word in a phrase without quotation marks is counted, the score for this pattern is 20. Web page is blocked.

"word sentence"

20

0

20

This phrase does not appear exactly as written. Web page is allowed.

"word or phrase"

20

20

20

This phrase appears twice but is counted only once. Web page is blocked.

Sample configuration

To configure web content filter in the GUI:
  1. Go to Security Profiles > Web Filter and go to the Static URL Filter section.
  2. Enable Content Filter to display its options.

  3. Select Create New to display the content filter options.

  4. For Pattern Type, select Regular Expression and enter fortinet in the Pattern field.
    • Leave Language as Western.
    • Set Action to Block.
    • Set Status to Enable.

  5. Select OK to see the updated Static URL Filter section.

  6. Validate the configuration by visiting a website with the word fortinet, for example, www.fortinet.com. The website is blocked and a replacement page displays.

To configure web content filter in the CLI:
  1. Create a content table:
    config webfilter content
       edit 1                           <-- the id of this content
          set name "webfilter"
          config entries
             edit "fortinet"            <-- the banned word
               set pattern-type regexp  <-- the type is regular expression
               set status enable
               set lang western
               set score 10             <-- the score for this word is 10
               set action block
             next
          end
       next
    end
  2. Attach the content table to the Web Filter profile:
    config webfilter profile
       edit "webfilter"
          config web
             set bword-threshold 10  <-- the threshold is 10
             set bword-table 1       <-- the id of content table we created in the previous step
          end
          config ftgd-wf
             unset options
          end
       next
    end

Web content filter

Web content filter

You can control access to web content by blocking web pages containing specific words or patterns. This helps to prevent access to pages with questionable material. You can specify words, phrases, patterns, wildcards and Perl regular expressions to match content on web pages. You can use multiple web content filter lists and select the best web content filter list for each Web Filter profile.

Pattern type

When you have created the Web Filter content list, you need to add web content patterns to it. There are two types of patterns: wildcard and regular expression.

Wildcard

Use the wildcard setting to block or exempt one word or text strings of up to 80 characters. You can also use wildcard symbols such as ? or * to represent one or more characters. For example, a wildcard expression forti*.com matches fortinet.com and forticare.com. The * represents any character appearing any number of times.

Regular expression

Use the regular expression setting to block or exempt patterns of Perl expressions which use some of the same symbols as wildcard expressions but for different purposes. In regular expressions, * represents the character before the symbol. For example, forti*.com matches fortiii.com but not fortinet.com or fortiice.com. In this case, the symbol * represents i appearing any number of times.

The maximum number of web content patterns in a list is 5000.

Content evaluation

The web content filter feature scans the content of every web page that is accepted by a security policy. The system administrator can specify banned words and phrases and attach a numerical value, or score, to the importance of those words and phrases. When the web content filter scan detects banned content, it adds the scores of banned words and phrases found on that page. If the sum is higher than a threshold set in the Web Filter profile, FortiGate blocks the page.

The default score for web content filter is 10 and the default threshold is 10. This means that by default, a web page is blocked by a single match.

Banned words or phrases are evaluated according to the following rules:

  • The score for each word or phrase is counted only once, even if that word or phrase appears many times in the web page.
  • The score for any word in a phrase without quotation marks is counted.
  • The score for a phrase in quotation marks is counted only if it appears exactly as written.

Sample of applying banned pattern rules

The following table is an example of how rules are applied to the contents of a web page. For example, a web page contains only this sentence:

The score for each word or phrase is counted only once, even if that word or phrase appears many times in the web page.

Banned pattern

Assigned score

Score added to the sum for the entire page

Threshold score

Comment

word

20

20

20

Appears twice but only counted once. Web page is blocked.

word phrase

20

40

20

Each word appears twice but only counted once giving a total score of 40. Web page is blocked.

word sentence

20

20

20

“word” appears twice, “sentence” does not appear, but since any word in a phrase without quotation marks is counted, the score for this pattern is 20. Web page is blocked.

"word sentence"

20

0

20

This phrase does not appear exactly as written. Web page is allowed.

"word or phrase"

20

20

20

This phrase appears twice but is counted only once. Web page is blocked.

Sample configuration

To configure web content filter in the GUI:
  1. Go to Security Profiles > Web Filter and go to the Static URL Filter section.
  2. Enable Content Filter to display its options.

  3. Select Create New to display the content filter options.

  4. For Pattern Type, select Regular Expression and enter fortinet in the Pattern field.
    • Leave Language as Western.
    • Set Action to Block.
    • Set Status to Enable.

  5. Select OK to see the updated Static URL Filter section.

  6. Validate the configuration by visiting a website with the word fortinet, for example, www.fortinet.com. The website is blocked and a replacement page displays.

To configure web content filter in the CLI:
  1. Create a content table:
    config webfilter content
       edit 1                           <-- the id of this content
          set name "webfilter"
          config entries
             edit "fortinet"            <-- the banned word
               set pattern-type regexp  <-- the type is regular expression
               set status enable
               set lang western
               set score 10             <-- the score for this word is 10
               set action block
             next
          end
       next
    end
  2. Attach the content table to the Web Filter profile:
    config webfilter profile
       edit "webfilter"
          config web
             set bword-threshold 10  <-- the threshold is 10
             set bword-table 1       <-- the id of content table we created in the previous step
          end
          config ftgd-wf
             unset options
          end
       next
    end