What's New in 6.7.0
This document describes the additions for the FortiSIEM 6.7.0 release.
New Features
Active-Active Supervisor Cluster
This release allows you to run multiple Supervisors in an Active-Active mode. This means that you can log in to any Supervisor node and perform any operation. Rule, Query, Inline report processing and App Server functionality are distributed across the available Supervisor nodes. An Active-Active Supervisor Cluster provides resilience against Supervisor failures, higher number of GUI users, Agents, Collectors, and higher volume of rule and query processing.
An Active-Active Supervisor cluster is built around the concept of Leader and Follower Supervisor nodes, set up as a linked list.
-
The first Supervisor node where you install License, is a Leader. On this node, its UUID matches the UUID in the installed FortiCare License. On this node, Redis and PostGreSQL database services run in Master mode.
-
Next, you add a Supervisor Follower node, which follows the Leader node, in the sense that its PostGreSQL and Redis databases are replicated from corresponding databases in the Leader node. On this node, both Redis and PostGreSQL database services run in Follower (that is, read only) mode.
-
You can add another Supervisor Follower node and it will follow the Supervisor node added in Step 2. PostGreSQL and Redis databases are replicated from the corresponding databases running on the node created in Step 2. On this node, both Redis and PostGreSQL database services run in Follower (that is, read only) mode.
-
You can add more Supervisor Follower nodes by successfully chaining from the tail Follower node (in this case, the node created in Step 3).
The steps to set up Supervisor nodes are available under Adding Primary Leader and Adding Primary Follower under Configuring and Maintaining Active-Active Supervisor Cluster.
If the Supervisor Leader node fails, you can elect its Follower as (temporary) Leader and within 2 weeks install a new FortiCare License matching its UUID. If any other Supervisor fails, you simply delete the node from the GUI.
Steps to handle various Supervisor failure scenarios are available in Failover Operations.
A Load Balancer can be set up in front of the Supervisor nodes. Agents, Collectors should be configured to communicate to the Supervisor cluster via the Load Balancer. The Load Balancer should be configured to route web sessions from Agents and Collectors to Supervisor Cluster in a sticky fashion, meaning same HTTPS session is routed to the same Supervisor node.
For an example setup of a FortiWeb Load Balancer, see External Load Balancer Configuration.
Agents and Collectors in versions earlier than 6.7.0 will still communicate to the Supervisor they were configured with (likely the Leader node). Once upgraded to 6.7.0, Agents and Collectors will communicate via the Load Balancer. The list of Supervisors or the Load Balancer needs to be configured in FortiSIEM GUI in ADMIN > Settings > Cluster Config.
Active-Active Supervisor is a Licensed feature, that means that a new License with High Availability feature included, needs to be installed to to enable you to add a Supervisor Follower node. The License details shows the availability of the High Availability feature in ADMIN > License and in the output of CLI: “phLicenseTool --verify
”.
An Active-Active Supervisor Cluster works with the Disaster Recovery feature from earlier releases. Note that the Supervisors in Active-Active Supervisor Cluster are Read/Write, while the Supervisor in Disaster Recovery is Read Only. |
AWS S3 Archive for ClickHouse
If you are deployed in AWS and running ClickHouse for event storage, you can now use AWS S3 for archiving events when ClickHouse Hot or Warm storage becomes full. The AWS S3 data movement and query is done within ClickHouse and the retention policy works for the entire storage pipeline (Hot-> Warm->Archive).
Note that this AWS S3 Archive feature can be used only if you are deployed in AWS. If you are deployed in on-premise, then AWS S3 Archive does not work because of higher network latency between ClickHouse Supervisor/Worker and AWS S3 storage. |
Additionally, when storing ClickHouse data in AWS S3, Fortinet recommends turning Bucket Versioning off, or suspending it (if it was previously enabled). This is because data in ClickHouse files may change and versioning will keep both copies of data - new and old. With time, the number of stale objects may increase, resulting in higher AWS S3 costs. If versioning was previously enabled for the bucket, Fortinet recommends suspending it and configuring a policy to delete non-current versions.
Steps to set up a ClickHouse AWS S3 Archive is under Creating ClickHouse Archive Storage.
For additional information on ClickHouse deployments, see Configuring ClickHouse Based Deployments.
For information on configuration for Worker nodes, see Initial Configuration in ClickHouse Configuration.
Custom ServiceNow Incident Integration
In earlier releases, FortiSIEM used ServiceNow built-in incident database table for Incident Outbound and Inbound integration. In this release, a custom ServiceNow database table can be used instead.
For details on how to create custom ServiceNow integration, see ServiceNow SOAP Based Integration.
Key Enhancements
Compliance Reports for Saudi Arabia Essential Cybersecurity Controls
Compliance reports have been added for Saudi Arabia Essential Cybersecurity Controls (KSA-ECC).
Detailed Audit Log for Failed CMDB Outbound Integrations
A detailed audit log (PH_AUDIT_CMDB_OUTBOUND_INT_FAILED) is created for failed CMDB Outbound Integrations. It captures the host for which the Integration failed.
A Sample log:
2022-11-15 16:21:32,193 INFO [http-listener-2(7)] com.ph.phoenix.framework.logging.PhAudit - [PH_AUDIT_CMDB_OUTBOUND_INT_FAILED]:[phCustId]=1,[reptVendor]=ServiceNow,[hostName]=HOST-10.1.1.1,[eventSeverity]=PHL_INFO,[phEventCategory]=2,[infoURL]=https://ven02318.service-now.com/,[hostIpAddr]=10.1.1.1,[procName]=AppServer,[srcIpAddr]=10.10.1.1,[user]=admin,[objType]=Device,[phLogDetail]=HOST-10.1.1.1 was not uploaded because: Could not find this device type,Cisco WLAN Controller, in Mapping Config file.
Detailed Reason for Query REST API Failure
The Query REST API provides a detailed reason in situations where the Query fails.
API: https://<Supervisor_IP>/phoenix/rest/query/progress/<queryId>
//Error case <response requestId="10201" timestamp="1666376638748"> <result> <progress>100</progress> <error code="9"> <description>Not supported expr: LEN</description> </error> </result> </response> //Good case <response requestId="10201" timestamp="1666376638748"> <result> <progress>100</progress> <error code="0"> <description></description> </error> </result> </response>
Bug Fixes and Minor Enhancements
Bug ID |
Severity |
Module |
Description |
---|---|---|---|
859950, 858445 | Major | App Server | Rule evaluation is slow when filter condition uses individual Malware and Country Group Items. |
854349 | Major | App Server | Malware hash update from large 100K line CSV file causes App Server CPU to go 90%. |
857967, 857550 | Major | Rule | RuleWorker pauses when there is a rule change and App Server is busy. |
861554 |
Minor |
App Server |
Custom rules for Org trigger incidents even if they are disabled for that Org but may be enabled for other Orgs. |
860526 | Minor | App Server | Windows/Linux Agent status may become incorrect after App Server restart. |
860517 | Minor | App Server | SQL Injection vulnerability in CMDB Report Display fields. |
858900 | Minor | App Server | Worker rule download from App Server may be slow due to sub-optimal handling of Techniques and Tactics. |
858787 | Minor | App Server | Device usage counts incorrect in enterprise environment after the upgrade from 6.4.0 to 6.6.2. |
857944 | Minor | App Server | OKTA Authentication redirects to Flash UI. |
851078 | Minor | App Server | Incident email notification may be slow when email is mis-configured. |
851077 | Minor | App Server | Incident queries by Incident ID may time out if there are lots of incidents stored in PostGreSQL over a long period of time. |
848287 | Minor | App Server | Source/Destination Interface SNMP Index does not display due to App Server inserting the name in front. |
843361 | Minor | App Server | Windows Agent license is counted even after moving the server to Decomissioned folder in CMDB. |
843342 | Minor | App Server | Incident title and name are empty for auto clear Incidents triggered by OSPF Neighbor Down Rule. |
840694 | Minor | App Server | AGENT method disappears from CMDB Method column when SNMP discovery is re-ran. |
839552 | Minor | App Server | Cloned User defined Event Type becomes System. |
838600 | Minor | App Server | Device name change does not take effect on collectors that do not discover/monitor the device. |
835741 | Minor | App Server | Custom rules in 'performance, availability and change' category are triggered for devices which are in Maintenance Window. |
819401 | Minor | App Server | STIX/TAXII 2.0 feed is not being pulled from Anomali free service. |
809914 | Minor | App Server | Dashboard is orphaned when the owner is deleted from FortiSIEM. |
853913 | Minor | Data | Some reports, e.g. FDC Top Victims and FGT Traffic Top Applications, are incorrect. |
848577 | Minor | Data | AOWUA_DNSParser does not set Event Severity. |
846604 | Minor | Data | System rules involving City triggers incorrectly with City = N/A. |
842119 | Minor | Data | FortiSandboxParser does not parse 'File Name' attribute. |
839546 | Minor | Data | FortiDeceptor logs are not being correctly parsed by FortiSIEM. |
822231 | Minor | Data | Event Name for PAN-IDP-37264 is incorrect. |
813268 | Minor | Data | In CheckpointCEFParser, the TCP/UDP ports are not being parsed correctly. |
853819 | Minor | Data Purger | When the retention policy triggers, the archive data for CUSTOMER_1 contains other org's data. |
801973 | Minor | Data Purger | Two issues:
1. EnforceRetentionPolicy tool does not archive 2. Retention policy does not cover events with missing event attributes. |
857192 | Minor | Discovery | LDAP user discovery may hang. |
839636 | Minor | Discovery | FortiGate Discovery via API + SSH stops at 1% during discovery. |
838934 | Minor | Discovery | SNMP Test Connectivity fails against FortiOS 7.2.1, but discovery succeeds. |
841669 | Minor | Event Pulling | WMI/OMI event pulling may lag behind in some cases. |
862020 | Minor | Event Pulling Agent | Generic HTTPS Advanced Event Puller incorrectly sets lastPollTime window to local time instead of UTC. |
859767 | Minor | GUI | Incorrect Elasticsearch Org Bucket mapping from file when 10 groups are used, but there are gaps 50,001-50,010. GUI maps group 50,011 to 50,000. |
859557 | Minor | GUI | Unable to delete Dashboard Slideshows in super/global and orgs. |
853371 | Minor | GUI | Interface Usage Dashboard does not show data, but drill down from widget will show data. |
852369, 852362 | Minor | GUI | Cannot add extra ClickHouse disk to Supervisor without manual edit. |
846233 | Minor | GUI | Changing the logo and image of 'Global User Reports Template' in PDF export causes 'Contents' part to break. |
841762 | Minor | GUI | The search box in the collector health page doesn't work for fields like Organization and Collector ID. |
841636 | Minor | GUI | Full Admin user created using AD Group Role mapping cannot edit credentials that other users created. |
840625 | Minor | GUI | SAML users with full admin access CANNOT modify anything in User Settings pop up. |
833083 | Minor | GUI | Can't delete manager for user in CMDB. |
842025 | Minor | Linux Agent | Linux Agent fails to start up when ipv6 is disabled on ubuntu 20.04.5. |
860690 | Minor | Parser | Event parsing falls behind at 2k EPS if all events are forwarded to Kafka. |
766137 | Minor | Parser | Enhancement - WinOSWmiParser cannot parse some Spanish Attributes. |
848258 | Minor | Performance Monitoring | If Windows Agent is installed first, then SNMP performance monitoring does NOT pull all metrics on Windows servers. |
838534 | Minor | Performance Monitoring | Unable To Monitor Hyper-V Performance Metrics Through winexe-static without enabling SMB1Protocol. |
866034 |
Minor |
Query |
On ClickHouse, queries of the type "Destination IP IN Networks > Group: Public DNS Servers" hits the SQL Size limit and does not run. |
860571 | Minor | Query | PctChange function in Query fails on ClickHouse and phQueryMaster restarted. |
846629 | Minor | Query | For EventDB, Queries using != operator on strings may not return proper results on certain conditions. |
839542 | Minor | Query | Queries with multiple expressions will fail when there is no order clause. |
861196 |
Minor |
Rule |
phFortiInsightAI process may consume high CPU on Workers from excessive Win-Security-4624 logs. |
857424 | Minor | System | 'hdfs_archive_with_purge' is not kept from the previous phoenix config. |
844287 | Minor | System | FortiSIEM upgrade does not backup network_param.json. |
842161 | Minor | System | Collector stores certain REST API responses in files. |
810999 | Minor | System | TraceEnable is not disabled on port 80 for http. |
792418 | Minor | System | Remove Malware IOC feeds that are either obsolete or- RFE -Remove/replace out of box integrations, which are no longer active. |
840314 | Enhancement | App Server | Public Query REST API must contain detail error messages when the query fails to run. |
838829 | Enhancement | App Server | Very large Malware IOC update causes App Server to run out of memory. |
857014 | Enhancement | Data | Update latest FortiSwitch SysObjectId list. |
852149 | Enhancement | Data | For Cisco Umbrella Parser, several fields are not parsed. |
835806 | Enhancement | Data | Imperva parser update. |
835763 | Enhancement | Data | Cisco ISE parser need to normalize mac addresses. |
833721 | Enhancement | Data | RSAAuthenticationServerParser parser update. |
831476 | Enhancement | Data | MicrosoftExchangeTrackingLogParser misparsed value based on value containing ',' as delimiter. |
830604 | Enhancement | Data | WinOSWmiParser needs enhancement for rule - Windows: Pass the Hash Activity 2. |
824927 | Enhancement | Data | Apache Parser needs to parse URI query. |
856273 | Enhancement | System | Restrict EnforceRetentionPolicy tool to run only as admin. |
839129 | Enhancement | System | CMDB back up enhancements. |
839053 | Enhancement | System | Exit code from "monctl start" incorrectly returns failure if phMonitor is still running. |
833918 | Enhancement | System | Suppress unnecessary ansible warning messages and clean up task description. |
Known Issues
General
This release cannot be installed in IPV6 networks.
ClickHouse Related
-
If you are running ClickHouse event database and want to do Active-Active Supervisor failover, then your Supervisor should not be the only ClickHouse Keeper node. In that case, once the Supervisor is down, the ClickHouse cluster will be down and inserts will fail. It is recommended that you have 3 ClickHouse Keeper nodes running on Workers.
-
If you are running ClickHouse, then during a Supervisor upgrade to FortiSIEM 6.7.0 or later, instead of shutting down Worker nodes, you need to stop the backend processes by running the following command from the command line.
phtools --stop all
-
If you are running Elasticsearch or FortiSIEM EventDB and switch to ClickHouse, then you need to follow two steps to complete the database switch.
-
Set up the disks on each node in ADMIN > Setup> Storage and ADMIN > License > Nodes.
-
Configure ClickHouse topology in ADMIN > Settings > Database > ClickHouse Config.
-
-
In a ClickHouse environment, Queries will not return results if none of the query nodes within a shard are reachable from Supervisor and responsive. In other words, if at least 1 query node in every shard is healthy and responds to queries, then query results will be returned. To avoid this condition, make sure all Query Worker nodes are healthy.
-
If more than 1 ClickHouse Keeper node is down, then the following manual steps are required before you can delete the Keeper nodes from the GUI.
Step 1. Delete the failed nodes from ClickHouse Keeper config file.
On all surviving ClickHouse Keeper nodes, modify the Keeper config located at
/data-clickhouse-hot-1/clickhouse-keeper/conf/keeper.xml
<yandex> <keeper_server> <raft_configuration> <server> -- server 1 info -- </server> <server> -- server 2 info -- </server> </raft_configuration> </keeper_server> </yandex>
Remove any
<server></server>
block corresponding to the nodes that are down.Step 2. Prepare systemd argument for recovery
On all surviving ClickHouse Keeper nodes, make sure
/data-clickhouse-hot-1/clickhouse-keeper/.systemd_argconf
file has a single line as follows:ARG1=--force-recovery
Step 3. Restart ClickHouse Keeper
On all surviving ClickHouse Keeper nodes, run the following command.
systemctl restart ClickHouseKeeper
Check to make sure ClickHouse Keeper is restarted with
--force-recovery
option. Check with the following command.systemctl status ClickHouseKeeper
The output of the status command will show whether
--force-recovery
is being applied at "Process:" line, like so:Process: 487110 ExecStart=/usr/bin/clickhouse-keeper --daemon --force-recovery --config=/data-clickhouse-hot-1/clickhouse-keeper/conf/keeper.xml (code=exited, status=0/SUCCESS)
Step 4. Check to make sure ClickHouse Server(s) are up
After Step 3 is done on all surviving ClickHouse Keeper nodes, wait for a couple of minutes to make sure the sync up between ClickHouse Keeper and ClickHouse Server finish successfully. Use four letter commands to check the status of ClickHouse Keepers.
On each surviving ClickHouse Keeper node, issue the following command.
echo stat | nc localhost 2181
If "clickhouse-client" shell is available, then ClickHouse Server is up and fully synced up with ClickHouse Keeper cluster.
Extra commands such as
mntr
(maintenance) andsrvr
(server) will also provide overlapping and other information as well.Step 5. Update Redis key for ClickHouse Keepers
On Supervisor Leader Redis, set the following key to the comma-separated list of surviving ClickHouse Keeper nodes' IP addresses. For example, assuming 172.30.58.96 and 172.30.58.98 are surviving ClickHouse Keeper nodes, the command will be:
set cache:ClickHouse:clickhouseKeeperNodes "172.30.58.96,172.30.58.98"
You can use the following command to check the values before and after setting to the new value:
get cache:ClickHouse:clickhouseKeeperNode
Step 6. Update ClickHouse Config via GUI.
After all the changes from Steps 1-5 above have been successfully executed, you can delete all the down ClickHouse Keeper from ADMIN > Settings > ClickHouse Config page.
Note: For re-adding the ClickHouse Keeper nodes that are down.
-
First, make sure that the ClickHouse Keeper nodes that are down are removed from ADMIN > License > Nodes page.
-
After powering up the ClickHouse Keeper node, as root, use the following script to cleanup the residual config from previously deployed instance.
/opt/phoenix/phscripts/clickhouse/cleanup_replicated_tables_and_keeper.sh
Since not all nodes act as ClickHouse Keeper Node and ClickHouse data node at the same time, the script might warn that tables or keeper directory is not found. The warning messages can be ignored.
-
After the cleanup in (b), add the nodes back via ADMIN > License > Nodes, and, then, can be added back to ClickHouse Keeper or ClickHouse Data cluster.
-
Disaster Recovery Related
After failing over from Primary to Secondary, you need to restart phMonitor
on all Secondary Workers for the queries to work properly.
Elasticsearch Related
-
In Elasticsearch based deployments, queries containing "IN Group X" are handled using Elastic Terms Query. By default, the maximum number of terms that can be used in a Terms Query is set to 65,536. If a Group contains more than 65,536 entries, the query will fail.
The workaround is to change the “max_terms_count” setting for each event index. Fortinet has tested up to 1 million entries. The query response time will be proportional to the size of the group.
Case 1. For already existing indices, issue the REST API call to update the setting
PUT fortisiem-event-*/_settings { "index" : { "max_terms_count" : "1000000" } }
Case 2. For new indices that are going to be created in the future, update fortisiem-event-template so those new indices will have a higher max_terms_count setting
-
cd /opt/phoenix/config/elastic/7.7
-
Add
"index.max_terms_count": 1000000
(including quotations) to the “settings” section of thefortisiem-event-template
.Example:
...
"settings": { "index.max_terms_count": 1000000,
...
-
Navigate to ADMIN > Storage > Online and perform Test and Deploy.
-
Test new indices have the updated terms limit by executing the following simple REST API call.
GET fortisiem-event-*/_settings
-
-
FortiSIEM uses dynamic mapping for Keyword fields to save Cluster state. Elasticsearch needs to encounter some events containing these fields before it can determine their type. For this reason, queries containing
group by
on any of these fields will fail if Elasticsearch has not seen any event containing these fields. Workaround is to first run a non-group by query with these fields to make sure that these fields have non-null haves.
EventDB Related
Currently, Policy based retention for EventDB does not cover two event categories: (a) System events with phCustId = 0, e.g. a FortiSIEM External Integration Error, FortiSIEM process crash etc., and (b) Super/Global customer audit events with phCustId = 3, e.g. audit log generated from a Super/Global user running an adhoc query. These events are purged when disk usage reaches high watermark.
HDFS Related
If you are running real-time Archive with HDFS, and have added Workers after the real-time Archive has been configured, then you will need to perform a Test and Deploy for HDFS Archive again from the GUI. This will enable HDFSMgr
to know about the newly added Workers.
High Availability Related
If you make changes to the following files on any node in the FortiSIEM Cluster, then you will have to manually copy these changes to other nodes.
-
FortiSIEM Config file (
/opt/phoenix/config/phoenix_config.txt
): If you change a Supervisor (respectively Worker, Collector) related change in this file, then the modified file should be copied to all Supervisors (respectively Workers, Collectors). -
FortiSIEM Identity and Location Configuration file (
/opt/phoenix/config/identity_Def.xml
): This file should be identical in Supervisors and Workers. If you make a change to this file on any Supervisor or Worker, then you need to copy this file to all other Supervisors and Workers. -
FortiSIEM Profile file (
ProfileReports.xml
): This file should be identical in Supervisors and Workers. If you make a change to this file on any Supervisor or Worker, then you need to copy this file to all other Supervisors and Workers. -
SSL Certificate (
/etc/httpd/conf.d/ssl.conf
): This file should be identical in Supervisors and Workers. If you make a change to this file on any Supervisor or Worker, then you need to copy this file to all other Supervisors and Workers. -
Java SSL Certificates (files
cacerts.jks
,keyfile
andkeystore.jks
under/opt/glassfish/domains/domain1/config/
): If you change these files on a Supervisor, then you have to copy these files to all Supervisors. -
Log pulling External Certificates
-
Event forwarding Certificates define in FortiSIEM Config file (
/opt/phoenix/config/phoenix_config.txt
): If you change on one node, you need to change on all nodes. -
Custom cron job: If you change this file on a Supervisor, then you have to copy this file to all Supervisors.