Wildcards and Perl regular expressions
Many Security Profiles feature list entries can include wildcards or Perl regular expressions.
For more information about using Perl regular expressions, see http://perldoc.perl.org/perlretut.html.
Regular expression vs. wildcard match pattern
A wildcard character is a special character that represents one or more other characters. The most commonly used wildcard characters are the asterisk (*), which typically represents zero or more characters in a string of characters, and the question mark (?), which typically represents any one character.
In Perl regular expressions, the ‘.’ character refers to any single character. It is similar to the ‘?’ character in wildcard match pattern. As a result:
- example.com not only matches example.com but also examplea.com, exampleb.com, examplec.com, and so on.
|
To add a question mark ( |
To match a special character such as '.' and ‘*’ use the escape character ‘\’. For example:
- To match example.com, the regular expression should be: example\.com
In Perl regular expressions, ‘*’ means match 0 or more times of the character before it, not 0 or more times of any character. For example:
- exam*.com matches exammmm.com but does not match example.com
To match any character 0 or more times, use ‘.*’ where ‘.’ means any character and the ‘*’ means 0 or more times. For example, the wildcard match pattern exam*.com should be exam.*\.com.
Word boundary
In Perl regular expressions, the pattern does not have an implicit word boundary. For example, the regular expression “test” not only matches the word “test” but also any word that contains “test” such as “atest”, “mytest”, “testimony”, “atestb”. The notation “\b” specifies the word boundary. To match exactly the word “test”, the expression should be \btest\b.
Case sensitivity
Regular expression pattern matching is case sensitive in the web and Email Filter filters. To make a word or phrase case insensitive, use the regular expression /i
. For example, /bad language/i
will block all instances of “bad language”, regardless of case.
Perl regular expression formats
The following table lists and describes some example Perl regular expressions.
Perl regular expression formats
Expression |
Matches |
---|---|
abc |
“abc” (the exact character sequence, but anywhere in the string) |
^abc |
“abc” at the beginning of the string |
abc$ |
“abc” at the end of the string |
a|b |
Either “a” or “b” |
^abc|abc$ |
The string “abc” at the beginning or at the end of the string |
ab{2,4}c |
“a” followed by two, three or four “b”s followed by a “c” |
ab{2,}c |
“a” followed by at least two “b”s followed by a “c” |
ab*c |
“a” followed by any number (zero or more) of “b”s followed by a “c” |
ab+c |
“a” followed by one or more b's followed by a c |
ab?c |
“a” followed by an optional “b” followed by a” c”; that is, either “abc” or ”ac” |
a.c |
“a” followed by any single character (not newline) followed by a” c “ |
a\.c |
“a.c” exactly |
[abc] |
Any one of “a”, “b” and “c” |
[Aa]bc |
Either of “Abc” and “abc” |
[abc]+ |
Any (nonempty) string of “a”s, “b”s and “c”s (such as “a”, “abba”, ”acbabcacaa”) |
[^abc]+ |
Any (nonempty) string which does not contain any of “a”, “b”, and “c” (such as “defg”) |
d\d |
Any two decimal digits, such as 42; same as \d{2} |
/i |
Makes the pattern case insensitive. For example, |
\w+ |
A “word”: A nonempty sequence of alphanumeric characters and low lines (underscores), such as foo and 12bar8 and foo_1 |
100\s*mk |
The strings “100” and “mk” optionally separated by any amount of white space (spaces, tabs, newlines) |
abc\b |
“abc” when followed by a word boundary (for example, in “abc!” but not in “abcd”) |
perl\B |
“perl” when not followed by a word boundary (for example, in “perlert” but not in “perl stuff”) |
\x |
Tells the regular expression parser to ignore white space that is neither preceded by a backslash character nor within a character class. Use this to break up a regular expression into (slightly) more readable parts. |
/x |
Used to add regular expressions within other text. If the first character in a pattern is forward slash '/', the '/' is treated as the delimiter. The pattern must contain a second '/'. The pattern between ‘/’ will be taken as a regular expressions, and anything after the second ‘/’ will be parsed as a list of regular expression options ('i', 'x', etc). An error occurs if the second '/' is missing. In regular expressions, the leading and trailing space is treated as part of the regular expression. |
Examples of regular expressions
Block any word in a phrase
/block|any|word/
Block purposely misspelled words
Spammers often insert other characters between the letters of a word to fool spam blocking software.
/^.*v.*i.*a.*g.*r.*o.*$/i
/cr[eéèêë][\+\-\*=<>\.\,;!\?%&§@\^°\$£\{\}()\[\]\|\\_01]dit/i
Block common spam phrases
The following phrases are some examples of common phrases found in spam messages.
/try it for free/i
/student loans/i
/you’re already approved/i
/special[\+\-\*=<>\.\,;!\?%&~#§@\^°\$£\{\}()\[\]\|\\_1]offer/i