Fortinet white logo
Fortinet white logo

Handbook

6.0.0

Wildcards and Perl regular expressions

Wildcards and Perl regular expressions

Many Security Profiles feature list entries can include wildcards or Perl regular expressions.

For more information about using Perl regular expressions, see http://perldoc.perl.org/perlretut.html.

Regular expression vs. wildcard match pattern

A wildcard character is a special character that represents one or more other characters. The most commonly used wildcard characters are the asterisk (*), which typically represents zero or more characters in a string of characters, and the question mark (?), which typically represents any one character.

In Perl regular expressions, the ‘.’ character refers to any single character. It is similar to the ‘?’ character in wildcard match pattern. As a result:

  • example.com not only matches example.com but also examplea.com, exampleb.com, examplec.com, and so on.

note icon

To add a question mark (?) character to a regular expression from the FortiGate CLI, enter Ctrl+V followed by ?. To add a single backslash character (\) to a regular expression from the CLI you must add precede it with another backslash character. For example, example\\.com.

To match a special character such as '.' and ‘*’ use the escape character ‘\’. For example:

  • To match example.com, the regular expression should be: example\.com

In Perl regular expressions, ‘*’ means match 0 or more times of the character before it, not 0 or more times of any character. For example:

  • exam*.com matches exammmm.com but does not match example.com

To match any character 0 or more times, use ‘.*’ where ‘.’ means any character and the ‘*’ means 0 or more times. For example, the wildcard match pattern exam*.com should be exam.*\.com.

Word boundary

In Perl regular expressions, the pattern does not have an implicit word boundary. For example, the regular expression “test” not only matches the word “test” but also any word that contains “test” such as “atest”, “mytest”, “testimony”, “atestb”. The notation “\b” specifies the word boundary. To match exactly the word “test”, the expression should be \btest\b.

Case sensitivity

Regular expression pattern matching is case sensitive in the web and Email Filter filters. To make a word or phrase case insensitive, use the regular expression /i. For example, /bad language/i will block all instances of “bad language”, regardless of case.

Perl regular expression formats

The following table lists and describes some example Perl regular expressions.

Perl regular expression formats

Expression

Matches

abc

“abc” (the exact character sequence, but anywhere in the string)

^abc

“abc” at the beginning of the string

abc$

“abc” at the end of the string

a|b

Either “a” or “b”

^abc|abc$

The string “abc” at the beginning or at the end of the string

ab{2,4}c

“a” followed by two, three or four “b”s followed by a “c”

ab{2,}c

“a” followed by at least two “b”s followed by a “c”

ab*c

“a” followed by any number (zero or more) of “b”s followed by a “c”

ab+c

“a” followed by one or more b's followed by a c

ab?c

“a” followed by an optional “b” followed by a” c”; that is, either “abc” or ”ac”

a.c

“a” followed by any single character (not newline) followed by a” c “

a\.c

“a.c” exactly

[abc]

Any one of “a”, “b” and “c”

[Aa]bc

Either of “Abc” and “abc”

[abc]+

Any (nonempty) string of “a”s, “b”s and “c”s (such as “a”, “abba”, ”acbabcacaa”)

[^abc]+

Any (nonempty) string which does not contain any of “a”, “b”, and “c” (such as “defg”)

d\d

Any two decimal digits, such as 42; same as \d{2}

/i

Makes the pattern case insensitive. For example, /bad language/i blocks any instance of bad language regardless of case.

\w+

A “word”: A nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1

100\s*mk

The strings “100” and “mk” optionally separated by any amount of white space (spaces, tabs, newlines)

abc\b

“abc” when followed by a word boundary (for example, in “abc!” but not in “abcd”)

perl\B

“perl” when not followed by a word boundary (for example, in “perlert” but not in “perl stuff”)

\x

Tells the regular expression parser to ignore white space that is neither preceded by a backslash character nor within a character class. Use this to break up a regular expression into (slightly) more readable parts.

/x

Used to add regular expressions within other text. If the first character in a pattern is forward slash '/', the '/' is treated as the delimiter. The pattern must contain a second '/'. The pattern between ‘/’ will be taken as a regular expressions, and anything after the second ‘/’ will be parsed as a list of regular expression options ('i', 'x', etc). An error occurs if the second '/' is missing. In regular expressions, the leading and trailing space is treated as part of the regular expression.

Examples of regular expressions

Block any word in a phrase

/block|any|word/

Block purposely misspelled words

Spammers often insert other characters between the letters of a word to fool spam blocking software.

/^.*v.*i.*a.*g.*r.*o.*$/i

/cr[eéèêë][\+\-\*=<>\.\,;!\?%&§@\^°\$£\{\}()\[\]\|\\_01]dit/i

Block common spam phrases

The following phrases are some examples of common phrases found in spam messages.

/try it for free/i

/student loans/i

/you’re already approved/i

/special[\+\-\*=<>\.\,;!\?%&~#§@\^°\$£\{\}()\[\]\|\\_1]offer/i

Wildcards and Perl regular expressions

Wildcards and Perl regular expressions

Many Security Profiles feature list entries can include wildcards or Perl regular expressions.

For more information about using Perl regular expressions, see http://perldoc.perl.org/perlretut.html.

Regular expression vs. wildcard match pattern

A wildcard character is a special character that represents one or more other characters. The most commonly used wildcard characters are the asterisk (*), which typically represents zero or more characters in a string of characters, and the question mark (?), which typically represents any one character.

In Perl regular expressions, the ‘.’ character refers to any single character. It is similar to the ‘?’ character in wildcard match pattern. As a result:

  • example.com not only matches example.com but also examplea.com, exampleb.com, examplec.com, and so on.

note icon

To add a question mark (?) character to a regular expression from the FortiGate CLI, enter Ctrl+V followed by ?. To add a single backslash character (\) to a regular expression from the CLI you must add precede it with another backslash character. For example, example\\.com.

To match a special character such as '.' and ‘*’ use the escape character ‘\’. For example:

  • To match example.com, the regular expression should be: example\.com

In Perl regular expressions, ‘*’ means match 0 or more times of the character before it, not 0 or more times of any character. For example:

  • exam*.com matches exammmm.com but does not match example.com

To match any character 0 or more times, use ‘.*’ where ‘.’ means any character and the ‘*’ means 0 or more times. For example, the wildcard match pattern exam*.com should be exam.*\.com.

Word boundary

In Perl regular expressions, the pattern does not have an implicit word boundary. For example, the regular expression “test” not only matches the word “test” but also any word that contains “test” such as “atest”, “mytest”, “testimony”, “atestb”. The notation “\b” specifies the word boundary. To match exactly the word “test”, the expression should be \btest\b.

Case sensitivity

Regular expression pattern matching is case sensitive in the web and Email Filter filters. To make a word or phrase case insensitive, use the regular expression /i. For example, /bad language/i will block all instances of “bad language”, regardless of case.

Perl regular expression formats

The following table lists and describes some example Perl regular expressions.

Perl regular expression formats

Expression

Matches

abc

“abc” (the exact character sequence, but anywhere in the string)

^abc

“abc” at the beginning of the string

abc$

“abc” at the end of the string

a|b

Either “a” or “b”

^abc|abc$

The string “abc” at the beginning or at the end of the string

ab{2,4}c

“a” followed by two, three or four “b”s followed by a “c”

ab{2,}c

“a” followed by at least two “b”s followed by a “c”

ab*c

“a” followed by any number (zero or more) of “b”s followed by a “c”

ab+c

“a” followed by one or more b's followed by a c

ab?c

“a” followed by an optional “b” followed by a” c”; that is, either “abc” or ”ac”

a.c

“a” followed by any single character (not newline) followed by a” c “

a\.c

“a.c” exactly

[abc]

Any one of “a”, “b” and “c”

[Aa]bc

Either of “Abc” and “abc”

[abc]+

Any (nonempty) string of “a”s, “b”s and “c”s (such as “a”, “abba”, ”acbabcacaa”)

[^abc]+

Any (nonempty) string which does not contain any of “a”, “b”, and “c” (such as “defg”)

d\d

Any two decimal digits, such as 42; same as \d{2}

/i

Makes the pattern case insensitive. For example, /bad language/i blocks any instance of bad language regardless of case.

\w+

A “word”: A nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1

100\s*mk

The strings “100” and “mk” optionally separated by any amount of white space (spaces, tabs, newlines)

abc\b

“abc” when followed by a word boundary (for example, in “abc!” but not in “abcd”)

perl\B

“perl” when not followed by a word boundary (for example, in “perlert” but not in “perl stuff”)

\x

Tells the regular expression parser to ignore white space that is neither preceded by a backslash character nor within a character class. Use this to break up a regular expression into (slightly) more readable parts.

/x

Used to add regular expressions within other text. If the first character in a pattern is forward slash '/', the '/' is treated as the delimiter. The pattern must contain a second '/'. The pattern between ‘/’ will be taken as a regular expressions, and anything after the second ‘/’ will be parsed as a list of regular expression options ('i', 'x', etc). An error occurs if the second '/' is missing. In regular expressions, the leading and trailing space is treated as part of the regular expression.

Examples of regular expressions

Block any word in a phrase

/block|any|word/

Block purposely misspelled words

Spammers often insert other characters between the letters of a word to fool spam blocking software.

/^.*v.*i.*a.*g.*r.*o.*$/i

/cr[eéèêë][\+\-\*=<>\.\,;!\?%&§@\^°\$£\{\}()\[\]\|\\_01]dit/i

Block common spam phrases

The following phrases are some examples of common phrases found in spam messages.

/try it for free/i

/student loans/i

/you’re already approved/i

/special[\+\-\*=<>\.\,;!\?%&~#§@\^°\$£\{\}()\[\]\|\\_1]offer/i