Fortinet white logo
Fortinet white logo

File Content Extraction

File Content Extraction v1.1.0

About the connector

File Content Extraction is a FortiSOAR™ utility to extract text, artifacts, and metadata from almost any file. Internet connectivity is required for the connector to download dependent packages.

This document provides information about the File Content Extraction Connector, which facilitates automated interactions, with a File Content Extraction server using FortiSOAR™ playbooks. Add the File Content Extraction Connector as a step in FortiSOAR™ playbooks and perform automated operations with File Content Extraction.

Version information

Connector Version: 1.1.0

FortiSOAR™ Version Tested on: 7.4.0-3024

File Content Extraction Version Tested on: 1.1.0

Authored By: Fortinet CSE

Certified: Yes

Release Notes for version 1.1.0

Following enhancements have been made to the File Content Extraction Connector in version 1.1.0:

  • Renamed File IRI parameter to File IRI/Path in the following operations:
    • Extract Text
    • Extract Artifacts
  • The File IRI/Path parameter now accepts either of File IRI and File path in the following operations:
    • Extract Text
    • Extract Artifacts

Installing the connector

Use the Connector Store to install the connector. For the detailed procedure to install a connector, click here.

You can also use the yum command as a root user to install the connector:

yum install cyops-connector-file-content-extraction

Prerequisites to configuring the connector

The FortiSOAR™ instance must be allowed to communicate with the following URLs:

  • repo1.maven.org
  • search.maven.org

Also, add an exception rule to your firewall to allow communication through TCP ports 80 and 443.

Minimum Permissions Required

  • Not applicable

Configuring the connector

For the procedure to configure a connector, click here

Configuration parameters

This connector does not require any configurations before usage.

Actions supported by the connector

The following automated operations can be included in playbooks and you can also use the annotations to access operations:

Function Description Annotation and Category
Extract Text Extracts text and metadata from file based on the file IRI/path that you have specified. extract_text
Extract Artifacts Extracts artifacts from file based on the file IRI/path that you have specified. extract_indicators
Get Backend Config Get extraction backend configuration details based on the file IRI/path that you have specified. get_backend_config

NOTE: File Content Extraction solution pack includes the playbook: Extract and Process Text From File which can be used with any file indicator to extract its content, metadata and artifacts as illustrated in the following screenshot:

operation: Extract Text

Input parameters

Parameter Description
File IRI/Path Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator.
NOTE: The file name is taken from the tmp directory.
Set Output Format to XHTML Select to format the extracted text in XHTML

Output

The output contains the following populated JSON schema:

{
    "metadata": {},
    "extracted_text": ""
}

operation: Extract Artifacts

Input parameters

Parameter Description
File IRI/Path Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator.
NOTE: The file name is taken from the /tmp directory.

Output

The output contains the following populated JSON schema:

{
    "extraction_result": {
        "IP": "",
        "CVE": "",
        "MD5": "",
        "URL": "",
        "Host": "",
        "SHA1": "",
        "Email": "",
        "SHA256": "",
        "results": "",
        "Filename": "",
        "Filepath": "",
        "Registry": "",
        "unified_result": "",
        "whitelisted_results": ""
    },
    "status": ""
}

operation: Get Backend Config

Input parameters

Parameter Description
Verbose Select to fetch verbose details.

Output

The output contains the following populated JSON schema:

{
    "Parsers": "",
    "MimeTypes": "",
    "Detectors": ""
}

Included playbooks

The Sample - File Content Extraction - 1.1.0 playbook collection comes bundled with the File Content Extraction connector. These playbooks contain steps using which you can perform all supported actions. You can see bundled playbooks in the Automation > Playbooks section in FortiSOAR™ after importing the File Content Extraction connector.

  • Extract Artifacts From File
  • Extract Text From File
  • Get Backend Config Details

Note: If you are planning to use any of the sample playbooks in your environment, ensure that you clone those playbooks and move them to a different collection, since the sample playbook collection gets deleted during connector upgrade and delete.

Troubleshooting

Extract text action may sometimes fail even after adding the prerequisites. In such cases:

  1. Clear the connector cache using the following command as root user:
    rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_your_connector_version/__pycache__

    For example to clear the cache of the file content extraction v1.1.0, the command would be:

    rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_1_1_0/__pycache__/
  2. Restart the uwsgi service using the following command as root user:
    systemctl restart uwsgi
Previous
Next

File Content Extraction v1.1.0

About the connector

File Content Extraction is a FortiSOAR™ utility to extract text, artifacts, and metadata from almost any file. Internet connectivity is required for the connector to download dependent packages.

This document provides information about the File Content Extraction Connector, which facilitates automated interactions, with a File Content Extraction server using FortiSOAR™ playbooks. Add the File Content Extraction Connector as a step in FortiSOAR™ playbooks and perform automated operations with File Content Extraction.

Version information

Connector Version: 1.1.0

FortiSOAR™ Version Tested on: 7.4.0-3024

File Content Extraction Version Tested on: 1.1.0

Authored By: Fortinet CSE

Certified: Yes

Release Notes for version 1.1.0

Following enhancements have been made to the File Content Extraction Connector in version 1.1.0:

Installing the connector

Use the Connector Store to install the connector. For the detailed procedure to install a connector, click here.

You can also use the yum command as a root user to install the connector:

yum install cyops-connector-file-content-extraction

Prerequisites to configuring the connector

The FortiSOAR™ instance must be allowed to communicate with the following URLs:

Also, add an exception rule to your firewall to allow communication through TCP ports 80 and 443.

Minimum Permissions Required

Configuring the connector

For the procedure to configure a connector, click here

Configuration parameters

This connector does not require any configurations before usage.

Actions supported by the connector

The following automated operations can be included in playbooks and you can also use the annotations to access operations:

Function Description Annotation and Category
Extract Text Extracts text and metadata from file based on the file IRI/path that you have specified. extract_text
Extract Artifacts Extracts artifacts from file based on the file IRI/path that you have specified. extract_indicators
Get Backend Config Get extraction backend configuration details based on the file IRI/path that you have specified. get_backend_config

NOTE: File Content Extraction solution pack includes the playbook: Extract and Process Text From File which can be used with any file indicator to extract its content, metadata and artifacts as illustrated in the following screenshot:

operation: Extract Text

Input parameters

Parameter Description
File IRI/Path Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator.
NOTE: The file name is taken from the tmp directory.
Set Output Format to XHTML Select to format the extracted text in XHTML

Output

The output contains the following populated JSON schema:

{
    "metadata": {},
    "extracted_text": ""
}

operation: Extract Artifacts

Input parameters

Parameter Description
File IRI/Path Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator.
NOTE: The file name is taken from the /tmp directory.

Output

The output contains the following populated JSON schema:

{
    "extraction_result": {
        "IP": "",
        "CVE": "",
        "MD5": "",
        "URL": "",
        "Host": "",
        "SHA1": "",
        "Email": "",
        "SHA256": "",
        "results": "",
        "Filename": "",
        "Filepath": "",
        "Registry": "",
        "unified_result": "",
        "whitelisted_results": ""
    },
    "status": ""
}

operation: Get Backend Config

Input parameters

Parameter Description
Verbose Select to fetch verbose details.

Output

The output contains the following populated JSON schema:

{
    "Parsers": "",
    "MimeTypes": "",
    "Detectors": ""
}

Included playbooks

The Sample - File Content Extraction - 1.1.0 playbook collection comes bundled with the File Content Extraction connector. These playbooks contain steps using which you can perform all supported actions. You can see bundled playbooks in the Automation > Playbooks section in FortiSOAR™ after importing the File Content Extraction connector.

Note: If you are planning to use any of the sample playbooks in your environment, ensure that you clone those playbooks and move them to a different collection, since the sample playbook collection gets deleted during connector upgrade and delete.

Troubleshooting

Extract text action may sometimes fail even after adding the prerequisites. In such cases:

  1. Clear the connector cache using the following command as root user:
    rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_your_connector_version/__pycache__

    For example to clear the cache of the file content extraction v1.1.0, the command would be:

    rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_1_1_0/__pycache__/
  2. Restart the uwsgi service using the following command as root user:
    systemctl restart uwsgi
Previous
Next