Fortinet white logo
Fortinet white logo

User Guide

Differences in Analytics Semantics between EventDB and Elasticsearch

Differences in Analytics Semantics between EventDB and Elasticsearch

FortiSIEM can run on EventDB, its own proprietary NoSQL database, or Elasticsearch. To make analytics work correctly in both environments, it is important to understand the differences. Analytics includes real-time search, historical search, and rule correlation.

FortiSIEM rule correlation and real-time search work identically in both environments, because computation is done in-memory. The database is not used.

However, for historical search, results are obtained from the database and the following differences exist in the area of string comparisons, primarily because of the way Elasticsearch, a third-party product, works.

Issues

  1. EventDB is a sub-string match while Elasticsearch is a word-based match with white space as a delimiter between words. This means that the EventDB will find a match anywhere in the string. For Elasticsearch, you must explicitly include wildcard characters. This affects string operations involving the following operators: =, IN, CONTAIN, REGEXP and their inverse versions: !=, NOT IN, NOT CONTAIN and NOT REGEXP.
  2. For Elasticsearch query, if an expression is defined as a display parameter and the expression includes aggregate functions, then the aggregates must be separately added as display parameters. For example, if a user wants to display an expression such as 100 - (100.0 * SUM(System Downtime))/SUM(Polling Interval), then the user must also add SUM(System Downtime) and SUM(Polling Interval) to the list of display parameters.
  3. Sorting does not work for
    • LAST and FIRST operators when the operand is a non-Date type.
    • HourOfDay and DayOfWeek operators
  4. When sorting is used for multiple key values, e.g. Group By Source IP, Destination IP, COUNT(*) DESC, then the results are presented by the last attribute (e.g. Destination IP). FortiSIEM EventDB sorts by all the fields taken as a tuple, e.g. (Source IP, Destination IP). See

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
    See also Example 1 - Matching Event Types and Example 2 - Matching Raw Messages

  5. Elasticsearch (and lucene) do not support full Perl-compatible regex syntax.
    https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html
    The table in Elasticsearch Support for Regex lists what is supported and workaround suggestions.

Example 1 - Matching Event Types

Suppose you are trying to match PH_DEV_MON for Event Type:

  • In EventDB, you can write any of the following:
    • EventType CONTAIN PH_DEV_MON
    • EventType CONTAIN _DEV_MON
    • EventType CONTAIN ph_dev_MON
    • EventType CONTAIN _DEV_mon
  • In Elasticsearch, you can write any of the following. Note that since event types do not end with PH_DEV_MON, you have to add the wildcard “.*” at the end.
    • EventType CONTAIN PH_DEV_MON.*
    • EventType CONTAIN .*_DEV_MON.*

Suppose you are trying to exactly match PH_DEV_MON_INTF_UTIL for Event Type:

  • In EventDB, you can write any of the following:
    • EventType = PH_DEV_MON_INTF_UTIL
    • EventType = ph_dev_mon_intf_util
    • EventType = ph_dev_MON_INTF_UTIL
  • In Elasticsearch, you must write:
    • EventType = PH_DEV_MON_INTF_UTIL

Example 2 - Matching Raw Messages

REGEX matching using the FortiSIEM eventDB is case insensitive.

Suppose the raw message is:

  • XYZ info=”ABB123CCC”

To match this raw message:

  • In EventDB, you can write any of the following:
    • Raw Message REGEX bb[0-9]*c*X?
    • Raw Message REGEX Abb[0-9]*c*X?"$
  • In Elasticsearch, you can write any of the following:
    • Raw Message REGEX BB[0-9]*c*X?
    • Raw Message REGEX .*BB[0-9]*c*X?

Elasticsearch Support for Regex

Regex syntax

Elasticsearch support

Workaround (if any)

. ? + * |

Yes

?? +? *?

No

Not possible

()

Yes

(?:)

No

Use () instead. Replace (?:com|net|org) with (com|net|org)

[]

Yes

[^]

Yes

{}

Yes

{}?

No

Not possible

^ $

No

Elasticsearch requires full match. Add .* for partial match.

\d \D \w \W \s \S

No

Replace \d with [0-9]

Replace \D with [^0-9]

Replace \w with [a-zA-Z0-9_]

Replace \W with [^a-zA-Z0-9_]

Replace \s with [ \t \n \r]

Replace \S with [^ \t \n \r]

\b \A \Z

No

Not possible

(?i:)

No

Not possible

\1 \2

No

Not possible

(?=)

No

Not possible

(?!)

No

Not possible

(?#)

No

Not possible

Case sensitive match on keyword attributes

No

If an attribute is not a keyword, it will be stored as lower case in Elasticsearch. Use abc or [aA][bB][cC]

Entire raw message search

No

Elasticsearch tokenizes string attributes using space as tokens. So, it is not possible to search the whole string. Use CONTAIN operator.

Differences in Analytics Semantics between EventDB and Elasticsearch

Differences in Analytics Semantics between EventDB and Elasticsearch

FortiSIEM can run on EventDB, its own proprietary NoSQL database, or Elasticsearch. To make analytics work correctly in both environments, it is important to understand the differences. Analytics includes real-time search, historical search, and rule correlation.

FortiSIEM rule correlation and real-time search work identically in both environments, because computation is done in-memory. The database is not used.

However, for historical search, results are obtained from the database and the following differences exist in the area of string comparisons, primarily because of the way Elasticsearch, a third-party product, works.

Issues

  1. EventDB is a sub-string match while Elasticsearch is a word-based match with white space as a delimiter between words. This means that the EventDB will find a match anywhere in the string. For Elasticsearch, you must explicitly include wildcard characters. This affects string operations involving the following operators: =, IN, CONTAIN, REGEXP and their inverse versions: !=, NOT IN, NOT CONTAIN and NOT REGEXP.
  2. For Elasticsearch query, if an expression is defined as a display parameter and the expression includes aggregate functions, then the aggregates must be separately added as display parameters. For example, if a user wants to display an expression such as 100 - (100.0 * SUM(System Downtime))/SUM(Polling Interval), then the user must also add SUM(System Downtime) and SUM(Polling Interval) to the list of display parameters.
  3. Sorting does not work for
    • LAST and FIRST operators when the operand is a non-Date type.
    • HourOfDay and DayOfWeek operators
  4. When sorting is used for multiple key values, e.g. Group By Source IP, Destination IP, COUNT(*) DESC, then the results are presented by the last attribute (e.g. Destination IP). FortiSIEM EventDB sorts by all the fields taken as a tuple, e.g. (Source IP, Destination IP). See

    https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
    See also Example 1 - Matching Event Types and Example 2 - Matching Raw Messages

  5. Elasticsearch (and lucene) do not support full Perl-compatible regex syntax.
    https://www.elastic.co/guide/en/elasticsearch/reference/current/regexp-syntax.html
    The table in Elasticsearch Support for Regex lists what is supported and workaround suggestions.

Example 1 - Matching Event Types

Suppose you are trying to match PH_DEV_MON for Event Type:

  • In EventDB, you can write any of the following:
    • EventType CONTAIN PH_DEV_MON
    • EventType CONTAIN _DEV_MON
    • EventType CONTAIN ph_dev_MON
    • EventType CONTAIN _DEV_mon
  • In Elasticsearch, you can write any of the following. Note that since event types do not end with PH_DEV_MON, you have to add the wildcard “.*” at the end.
    • EventType CONTAIN PH_DEV_MON.*
    • EventType CONTAIN .*_DEV_MON.*

Suppose you are trying to exactly match PH_DEV_MON_INTF_UTIL for Event Type:

  • In EventDB, you can write any of the following:
    • EventType = PH_DEV_MON_INTF_UTIL
    • EventType = ph_dev_mon_intf_util
    • EventType = ph_dev_MON_INTF_UTIL
  • In Elasticsearch, you must write:
    • EventType = PH_DEV_MON_INTF_UTIL

Example 2 - Matching Raw Messages

REGEX matching using the FortiSIEM eventDB is case insensitive.

Suppose the raw message is:

  • XYZ info=”ABB123CCC”

To match this raw message:

  • In EventDB, you can write any of the following:
    • Raw Message REGEX bb[0-9]*c*X?
    • Raw Message REGEX Abb[0-9]*c*X?"$
  • In Elasticsearch, you can write any of the following:
    • Raw Message REGEX BB[0-9]*c*X?
    • Raw Message REGEX .*BB[0-9]*c*X?

Elasticsearch Support for Regex

Regex syntax

Elasticsearch support

Workaround (if any)

. ? + * |

Yes

?? +? *?

No

Not possible

()

Yes

(?:)

No

Use () instead. Replace (?:com|net|org) with (com|net|org)

[]

Yes

[^]

Yes

{}

Yes

{}?

No

Not possible

^ $

No

Elasticsearch requires full match. Add .* for partial match.

\d \D \w \W \s \S

No

Replace \d with [0-9]

Replace \D with [^0-9]

Replace \w with [a-zA-Z0-9_]

Replace \W with [^a-zA-Z0-9_]

Replace \s with [ \t \n \r]

Replace \S with [^ \t \n \r]

\b \A \Z

No

Not possible

(?i:)

No

Not possible

\1 \2

No

Not possible

(?=)

No

Not possible

(?!)

No

Not possible

(?#)

No

Not possible

Case sensitive match on keyword attributes

No

If an attribute is not a keyword, it will be stored as lower case in Elasticsearch. Use abc or [aA][bB][cC]

Entire raw message search

No

Elasticsearch tokenizes string attributes using space as tokens. So, it is not possible to search the whole string. Use CONTAIN operator.