Custom job templates

When you select a template for your custom job, you might need to fill out additional fields depending on the template you select. The following templates require additional configuration before you can apply them.

Backup Table Validation

The Backup Table Validation template is used to verify the data integrity of the backup data at the selected location.

Select the storage pool and enter the Hadoop Distributed File System (HDFS) URL for the backup location.

Custom Template

Custom templates are used to create the content for custom jobs for when built-in jobs don't meet your specific needs. You can create custom templates to operate the host, collect information, take actions, and more.

Custom templates require you to use the Ansible playbook YAML format to define the content. For information about Ansible specifications, refer to the official Ansible documentation.

The following example template collects the disk usage of the BigData Controller and sends it to a Slack channel:

- name: Collect disk usage and send to slack

hosts: controllerIp

vars:

- slack_url: "https://hooks.slack.com/services/xxxxxxx" # your slack app webhook url

tasks:

- name: Collect disk usage

command:"df -h"

- name: Send to slack

uri:

url:"{{ slack_url }}"

body:'{"text": "{{ result.stdout }}"}'

body_format: json

method: POST

The follow table shows all the Ansible inventory group names you can use as hosts values in your playbook and template. Those values are pre-populated in the Ansible inventory and are automatically applied with each execution.

hdfs_datanode hdfs_namenode kudu_tserver kudu hive_metastore zookeeper kafka_broker impala_catalog impala impala_statestore yarn_nodemanager yarn_resource_manager spark_history_server	These inventory groups can be used to select the host(s) that have the named services running. For example, using “host: kudu_tserver” in your playbook allows it to be executed on all hosts has kudu-tserver instance.
hdfs_datanode_reachable hdfs_namenode_reachable kudu_tserver_reachable kudu_reachable hive_metastore_reachable zookeeper_reachable kafka_broker_reachable impala_catalog_reachable impala_reachable impala_statestore_reachable yarn_nodemanager_reachable yarn_resource_manager_reachable spark_history_server_reachable	These groups can be used to select one of the reachable hosts that belong to the named service. For example: kudu has instances spreading on 3 hosts, and “hosts:kudu_reachable” will randomly return one that is reachable at the execution time.
metastore datanode master	These groups can be used to select hosts the belong to the named role.
metastore_reachable datanode_reachable master_reachable	These groups can be used to select a random host that is reachable at the execution time, from the ones with the named role.
controllerIp	This group can be used to the BigData Controller host.

In addition to these groups, you can also use the host name shown in the Hosts page to directly select a particular host for the playbook execution.

Data Log Type Appendix

The Data Log Type Appendix is run to re-generate the list of available log types for LogView.

This is a resource intensive operation. Run this only if the available log types sidebar of LogView is not working properly.

Docker System Prune

The Docker System Prune template is run to remove all unused docker containers, networks, and images (both dangling and unreferenced) to clear disk space.

Facet Formation Manual Run

The Facet Formation Manual Run enables you to manually run a facet formation. Run this job only when the FortiView query performance is exceptionally slow.

First, select a storage pool, and then select the time to do facet formation. You can choose between starting the facet formation from the beginning, or from a specific time.

HDFS Safemode Leave

The HDFS Safemode Leave template enables you to leave the HDFS safe mode from an unexpected shutdown.

Hive Metastore Backup

The Hive Metastore Backup template creates a backup of the data in Hive Metastore and saves it to an HDFS location.

Hive Metastore Restore

The Hive Metastore Restore template restores the data in Hive Metastore from an HDFS location.

Kafka Deep Clean

The Kafka Deep Clean template deep cleans Kafka topics and reinstalls Kafka (see How to recover from an unhealthy service status).

Kafka Rebalance

The Kafka Rebalance template rebalances the data load across the hosts. This is useful for when a Kafka node is decommissioned or when a new Kafka node joins or leaves the cluster. It includes replica leadership rebalance and partition rebalance. For more information, see Scaling FortiAnalyzer-BigData.

NTP Sync

The NTP Sync template performs a manual NTP time sync on all the BigData hosts. Run this job when Kudu time synchronization is unsynced (see How to recover from an unhealthy service status).

Purge Data Pipeline

This job resets the watermark and performs a clean restart of the pipeline.

Any unprocessed data will be lost (see How to recover from an unhealthy service status).