Disaster Recovery Operations

Primary (Site 1) Fails, Site 2 Becomes Primary
Site 1 is Up and Becomes Primary
Viewing Replication Health
Implementation Notes

Primary (Site 1) Fails, Site 2 Becomes Primary

If Site 1 fails, its Workers no longer function, and events are buffered at the Collectors, which are ready to push these events to Site 2. You must now prepare Elasticsearch on Site 2 to be ready for insertion.

Step 1. Switch Site 2 Role to Primary in FortiSIEM

See Switching Primary and Secondary Roles in the latest High Availability and Disaster Recovery Procedures - EventDB Guide here.

Step 2. Save Elasticsearch Settings on Site 2 in FortiSIEM

After the Site 2 Role has been switched to Primary, take the following steps:

Note: If you have a custom event template on Site 1, you will need to upload the same custom event template to Site 2 first before proceeding with these instructions.

Login to the Site 2 FortiSIEM GUI.
Navigate to ADMIN > Setup > Storage Online.
Click Test to test the settings.
Click Save to save the online settings.

Step 3. Confirm Events are Inserted to Site 2

At this point, Collectors should be communicating to the Site 2 Supervisor, and would get a set of Site 2 Event (Upload) Workers. Since Site 2 Workers are connected to the Site 2 Elasticsearch Cluster, events are now stored in the Site 2 Elasticsearch. You can verify this by running queries from the Site 2 Supervisor's ANALYTICS page.

Step 4. Confirm Incident Index is Created and Updated to Site 2

To verify that the Incident index in Elasticsearch Site 2 has be created and updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the incident index, and compare the Incident counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Step 5. Confirm Lookup Index is Updated to Site 2

To verify that the fortisiem-lookups index in Elasticsearch Site2 has be updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the fortisiem-lookup index, and compare the document counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Site 1 is Up and Becomes Primary

Overview

If Site 1 comes back up, you can set it to become Primary by following these general steps:

Set Site 1 to Secondary and confirm data from Site 2 is replicated to Site 1.
Stop Collectors from sending events to Site 2.
Switch Site 1 Role to Primary in FortiSIEM.
Miscellaneous:

Step 1. Set Site 1 to Secondary and Confirm Data from Site 2 is Replicated to Site 1

The Site 1 CMDB must sync up with the Site 2 CMDB, since new devices, rules, reports, etc. may exist in Site 2. Hence Site 1 needs to be Secondary first.

Set Site 1 to Secondary in FortiSIEM by taking the following steps:
1. Login to the Site 2 FortiSIEM GUI.
2. Navigate to ADMIN > License > Nodes.
3. Verify that there is a Secondary Node entry for Site 1, and it shows Inactive under Replication status.
4. With Site 1 selected, click Edit, and double check that the information is correct.
5. Click Save.
  
  At this point, Site 1 is now Secondary.
Make sure all information is correct by taking the following steps:
1. Login to the Site 1 GUI, and check the new devices, rules, and reports, ensuring that they are updated.
2. Compare the data on Site 1 and Site 2. All indices, and document numbers should be identical.

Step 2. Stop Collectors from Sending Events to Site 2

After following step 1, you will need to stop the Collectors from sending events to Site 2. To do this, take the following steps:

Login to the Site 2 GUI.
Navigate to ADMIN > Settings > System > Event Worker.
Remove all the Event Workers.
Click Save.

Collectors will now start buffering events.

Step 3. Switch Site 1 Role to Primary in FortiSIEM

See Switching Primary and Secondary Roles in the latest High Availability and Disaster Recovery Procedures - EventDB Guide here.

Save Elasticsearch Settings on Site 1 in FortiSIEM

Save the Site 1 Elasticsearch settings by taking the following steps.

Note: If you have a custom event template in Site 2, you must upload the same custom event template in Site 1 first before proceeding with these instructions.

Login to FortiSIEM Site 1 GUI.
Navigate to ADMIN > Setup > Storage > Online.
Click Test to verify your settings.
Click Save.

Verify All Event Workers are Added to Site 1

Verify that all event workers are added to Site 1 by taking the following steps.

Login to FortiSIEM Site 1 GUI.
Navigate to ADMIN > Settings > System > Event Worker.
Verify all event workers are added to the Event Worker list.

All Collectors will now send events to Site 1.

Verify Events are Being Written into Site 1

Verify that events are being written into Site 1 by taking the following steps.

Login to FortiSIEM Site 1 GUI.
Navigate to ANALYTICS.
Run some queries and make sure events are coming in.

Confirm Incident Index is Re-Created and Updated to Site 1

To verify that the Incident index in Elasticsearch Site1 has be created and updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the incident index, and compare the Incident counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Confirm Lookup Index is Updated to Site 1

To verify that the fortisiem-lookups index in Elasticsearch Site1 has be updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the fortisiem-lookup index, and compare the document counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Verify Events are Being Replicated to Site 2

Take the following steps to check on Elasticsearch event replication.

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Cross-Cluster Replication.
Verify that the follower indices are created automatically.

Viewing Replication Health

Replication progress is available by navigating to ADMIN > Health > Replication Health. For details see here.

Implementation Notes

Changing Index Lifecycle Management Parameters on Primary

When replication is occurring, if you change the Index Lifecycle Management (ILM) age or hot/warm/cold thresholds on Primary, you will need to restart phDataPurger on Secondary. The restart is necessary to enable phDataPurger on Secondary to read the new changes.

Circuit_Breaking_Exception in Elasticsearch

Enabling Cross-Cluster Replication (CCR) may affect the heap memory usage of Elasticsearch. If you encounter a request circuit_breaking_exception in Elasticsearch, please try the following solutions to fix the issue：

1. Increase "indices.breaker.request.limit" from its default 60% to 85%. It can be hard coded in elasticsearch.yml or configured dynamically with the command below:

curl -X PUT "<Site 1's coordinator ip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "indices.breaker.request.limit": "85%"
  }	
}
'

curl -X PUT "<Site 2's coordinator ip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "indices.breaker.request.limit": "85%"
  }	
}
'

Increase Elasticsearch heap size in jvm.options in data nodes, then restart all data nodes.

Circuit Breaker Settings Reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.12/circuit-breaker.html

Possible Inconsistent Index State (Follower and Frozen) in Secondary

If Site 1 has cold nodes and Disaster Recovery is enabled, then the Secondary Site 2 may have indices in Follower and Frozen state. This will cause Elasticsearch to throw the following exception: "background management of retention lease failed while following". An index that is frozen cannot be written into and therefore cannot be in Follower state. Also, the index will likely be in Closed state and hence cannot be queried.

To solve this problem, the user needs to take the following two steps using Kibana.

Unfollow the index.
Open the index.

Disaster Recovery Operations

Primary (Site 1) Fails, Site 2 Becomes Primary
Site 1 is Up and Becomes Primary
Viewing Replication Health
Implementation Notes

Primary (Site 1) Fails, Site 2 Becomes Primary

Step 1. Switch Site 2 Role to Primary in FortiSIEM

See Switching Primary and Secondary Roles in the latest High Availability and Disaster Recovery Procedures - EventDB Guide here.

Step 2. Save Elasticsearch Settings on Site 2 in FortiSIEM

After the Site 2 Role has been switched to Primary, take the following steps:

Note: If you have a custom event template on Site 1, you will need to upload the same custom event template to Site 2 first before proceeding with these instructions.

Login to the Site 2 FortiSIEM GUI.
Navigate to ADMIN > Setup > Storage Online.
Click Test to test the settings.
Click Save to save the online settings.

Step 3. Confirm Events are Inserted to Site 2

Step 4. Confirm Incident Index is Created and Updated to Site 2

To verify that the Incident index in Elasticsearch Site 2 has be created and updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the incident index, and compare the Incident counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Step 5. Confirm Lookup Index is Updated to Site 2

To verify that the fortisiem-lookups index in Elasticsearch Site2 has be updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the fortisiem-lookup index, and compare the document counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Site 1 is Up and Becomes Primary

Overview

If Site 1 comes back up, you can set it to become Primary by following these general steps:

Set Site 1 to Secondary and confirm data from Site 2 is replicated to Site 1.
Stop Collectors from sending events to Site 2.
Switch Site 1 Role to Primary in FortiSIEM.
Miscellaneous:

Step 1. Set Site 1 to Secondary and Confirm Data from Site 2 is Replicated to Site 1

The Site 1 CMDB must sync up with the Site 2 CMDB, since new devices, rules, reports, etc. may exist in Site 2. Hence Site 1 needs to be Secondary first.

Set Site 1 to Secondary in FortiSIEM by taking the following steps:
1. Login to the Site 2 FortiSIEM GUI.
2. Navigate to ADMIN > License > Nodes.
3. Verify that there is a Secondary Node entry for Site 1, and it shows Inactive under Replication status.
4. With Site 1 selected, click Edit, and double check that the information is correct.
5. Click Save.
  
  At this point, Site 1 is now Secondary.
Make sure all information is correct by taking the following steps:
1. Login to the Site 1 GUI, and check the new devices, rules, and reports, ensuring that they are updated.
2. Compare the data on Site 1 and Site 2. All indices, and document numbers should be identical.

Step 2. Stop Collectors from Sending Events to Site 2

After following step 1, you will need to stop the Collectors from sending events to Site 2. To do this, take the following steps:

Login to the Site 2 GUI.
Navigate to ADMIN > Settings > System > Event Worker.
Remove all the Event Workers.
Click Save.

Collectors will now start buffering events.

Step 3. Switch Site 1 Role to Primary in FortiSIEM

See Switching Primary and Secondary Roles in the latest High Availability and Disaster Recovery Procedures - EventDB Guide here.

Save Elasticsearch Settings on Site 1 in FortiSIEM

Save the Site 1 Elasticsearch settings by taking the following steps.

Note: If you have a custom event template in Site 2, you must upload the same custom event template in Site 1 first before proceeding with these instructions.

Login to FortiSIEM Site 1 GUI.
Navigate to ADMIN > Setup > Storage > Online.
Click Test to verify your settings.
Click Save.

Verify All Event Workers are Added to Site 1

Verify that all event workers are added to Site 1 by taking the following steps.

Login to FortiSIEM Site 1 GUI.
Navigate to ADMIN > Settings > System > Event Worker.
Verify all event workers are added to the Event Worker list.

All Collectors will now send events to Site 1.

Verify Events are Being Written into Site 1

Verify that events are being written into Site 1 by taking the following steps.

Login to FortiSIEM Site 1 GUI.
Navigate to ANALYTICS.
Run some queries and make sure events are coming in.

Confirm Incident Index is Re-Created and Updated to Site 1

To verify that the Incident index in Elasticsearch Site1 has be created and updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the incident index, and compare the Incident counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Confirm Lookup Index is Updated to Site 1

To verify that the fortisiem-lookups index in Elasticsearch Site1 has be updated, take the following steps:

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Index Management.
Find the fortisiem-lookup index, and compare the document counts between Elasticsearch Site 1 and Elasticsearch Site 2.

Verify Events are Being Replicated to Site 2

Take the following steps to check on Elasticsearch event replication.

Login to Kibana.
Navigate to Kibana Home > Analytics section > Discover > Cross-Cluster Replication.
Verify that the follower indices are created automatically.

Viewing Replication Health

Replication progress is available by navigating to ADMIN > Health > Replication Health. For details see here.

Implementation Notes

Changing Index Lifecycle Management Parameters on Primary

Circuit_Breaking_Exception in Elasticsearch

1. Increase "indices.breaker.request.limit" from its default 60% to 85%. It can be hard coded in elasticsearch.yml or configured dynamically with the command below:

curl -X PUT "<Site 1's coordinator ip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "indices.breaker.request.limit": "85%"
  }	
}
'

curl -X PUT "<Site 2's coordinator ip>:9200/_cluster/settings?pretty" -H 'Content-Type: application/json' -d'
{
  "persistent" : {
    "indices.breaker.request.limit": "85%"
  }	
}
'

Increase Elasticsearch heap size in jvm.options in data nodes, then restart all data nodes.

Circuit Breaker Settings Reference: https://www.elastic.co/guide/en/elasticsearch/reference/7.12/circuit-breaker.html

Possible Inconsistent Index State (Follower and Frozen) in Secondary

To solve this problem, the user needs to take the following two steps using Kibana.

Unfollow the index.
Open the index.

High Availability and Disaster Recovery Procedures - Elasticsearch

Disaster Recovery Operations

Disaster Recovery Operations

Primary (Site 1) Fails, Site 2 Becomes Primary

Step 1. Switch Site 2 Role to Primary in FortiSIEM

Step 2. Save Elasticsearch Settings on Site 2 in FortiSIEM

Step 3. Confirm Events are Inserted to Site 2

Step 4. Confirm Incident Index is Created and Updated to Site 2

Step 5. Confirm Lookup Index is Updated to Site 2

Site 1 is Up and Becomes Primary

Overview

Step 1. Set Site 1 to Secondary and Confirm Data from Site 2 is Replicated to Site 1

Step 2. Stop Collectors from Sending Events to Site 2

Step 3. Switch Site 1 Role to Primary in FortiSIEM

Save Elasticsearch Settings on Site 1 in FortiSIEM

Verify All Event Workers are Added to Site 1

Verify Events are Being Written into Site 1

Confirm Incident Index is Re-Created and Updated to Site 1

Confirm Lookup Index is Updated to Site 1

Verify Events are Being Replicated to Site 2

Viewing Replication Health

Implementation Notes

Changing Index Lifecycle Management Parameters on Primary

Circuit_Breaking_Exception in Elasticsearch

Possible Inconsistent Index State (Follower and Frozen) in Secondary

Disaster Recovery Operations

Primary (Site 1) Fails, Site 2 Becomes Primary

Step 1. Switch Site 2 Role to Primary in FortiSIEM

Step 2. Save Elasticsearch Settings on Site 2 in FortiSIEM

Step 3. Confirm Events are Inserted to Site 2

Step 4. Confirm Incident Index is Created and Updated to Site 2

Step 5. Confirm Lookup Index is Updated to Site 2

Site 1 is Up and Becomes Primary

Overview

Step 1. Set Site 1 to Secondary and Confirm Data from Site 2 is Replicated to Site 1

Step 2. Stop Collectors from Sending Events to Site 2

Step 3. Switch Site 1 Role to Primary in FortiSIEM

Save Elasticsearch Settings on Site 1 in FortiSIEM

Verify All Event Workers are Added to Site 1

Verify Events are Being Written into Site 1

Confirm Incident Index is Re-Created and Updated to Site 1

Confirm Lookup Index is Updated to Site 1

Verify Events are Being Replicated to Site 2

Viewing Replication Health

Implementation Notes

Changing Index Lifecycle Management Parameters on Primary

Circuit_Breaking_Exception in Elasticsearch

Possible Inconsistent Index State (Follower and Frozen) in Secondary