File Content Extraction is a FortiSOAR™ utility to extract text, artifacts, and metadata from almost any file. Internet connectivity is required for the connector to download dependent packages.
This document provides information about the File Content Extraction Connector, which facilitates automated interactions, with a File Content Extraction server using FortiSOAR™ playbooks. Add the File Content Extraction Connector as a step in FortiSOAR™ playbooks and perform automated operations with File Content Extraction.
Connector Version: 1.1.0
FortiSOAR™ Version Tested on: 7.4.0-3024
File Content Extraction Version Tested on: 1.1.0
Authored By: Fortinet CSE
Certified: Yes
Following enhancements have been made to the File Content Extraction Connector in version 1.1.0:
Use the Connector Store to install the connector. For the detailed procedure to install a connector, click here.
You can also use the yum command as a root user to install the connector:
yum install cyops-connector-file-content-extraction
The FortiSOAR™ instance must be allowed to communicate with the following URLs:
repo1.maven.orgsearch.maven.orgAlso, add an exception rule to your firewall to allow communication through TCP ports 80 and 443.
For the procedure to configure a connector, click here
This connector does not require any configurations before usage.
The following automated operations can be included in playbooks and you can also use the annotations to access operations:
| Function | Description | Annotation and Category |
|---|---|---|
| Extract Text | Extracts text and metadata from file based on the file IRI/path that you have specified. | extract_text |
| Extract Artifacts | Extracts artifacts from file based on the file IRI/path that you have specified. | extract_indicators |
| Get Backend Config | Get extraction backend configuration details based on the file IRI/path that you have specified. | get_backend_config |
NOTE: File Content Extraction solution pack includes the playbook: Extract and Process Text From File which can be used with any file indicator to extract its content, metadata and artifacts as illustrated in the following screenshot:

| Parameter | Description |
|---|---|
| File IRI/Path | Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator. NOTE: The file name is taken from the tmp directory. |
| Set Output Format to XHTML | Select to format the extracted text in XHTML |
The output contains the following populated JSON schema:
{
"metadata": {},
"extracted_text": ""
}
| Parameter | Description |
|---|---|
| File IRI/Path | Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator. NOTE: The file name is taken from the /tmp directory. |
The output contains the following populated JSON schema:
{
"extraction_result": {
"IP": "",
"CVE": "",
"MD5": "",
"URL": "",
"Host": "",
"SHA1": "",
"Email": "",
"SHA256": "",
"results": "",
"Filename": "",
"Filepath": "",
"Registry": "",
"unified_result": "",
"whitelisted_results": ""
},
"status": ""
}
| Parameter | Description |
|---|---|
| Verbose | Select to fetch verbose details. |
The output contains the following populated JSON schema:
{
"Parsers": "",
"MimeTypes": "",
"Detectors": ""
}
The Sample - File Content Extraction - 1.1.0 playbook collection comes bundled with the File Content Extraction connector. These playbooks contain steps using which you can perform all supported actions. You can see bundled playbooks in the Automation > Playbooks section in FortiSOAR™ after importing the File Content Extraction connector.
Note: If you are planning to use any of the sample playbooks in your environment, ensure that you clone those playbooks and move them to a different collection, since the sample playbook collection gets deleted during connector upgrade and delete.
Extract text action may sometimes fail even after adding the prerequisites. In such cases:
root user:
rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_your_connector_version/__pycache__
For example to clear the cache of the file content extraction v1.1.0, the command would be:
rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_1_1_0/__pycache__/
uwsgi service using the following command as root user:
systemctl restart uwsgi
File Content Extraction is a FortiSOAR™ utility to extract text, artifacts, and metadata from almost any file. Internet connectivity is required for the connector to download dependent packages.
This document provides information about the File Content Extraction Connector, which facilitates automated interactions, with a File Content Extraction server using FortiSOAR™ playbooks. Add the File Content Extraction Connector as a step in FortiSOAR™ playbooks and perform automated operations with File Content Extraction.
Connector Version: 1.1.0
FortiSOAR™ Version Tested on: 7.4.0-3024
File Content Extraction Version Tested on: 1.1.0
Authored By: Fortinet CSE
Certified: Yes
Following enhancements have been made to the File Content Extraction Connector in version 1.1.0:
Use the Connector Store to install the connector. For the detailed procedure to install a connector, click here.
You can also use the yum command as a root user to install the connector:
yum install cyops-connector-file-content-extraction
The FortiSOAR™ instance must be allowed to communicate with the following URLs:
repo1.maven.orgsearch.maven.orgAlso, add an exception rule to your firewall to allow communication through TCP ports 80 and 443.
For the procedure to configure a connector, click here
This connector does not require any configurations before usage.
The following automated operations can be included in playbooks and you can also use the annotations to access operations:
| Function | Description | Annotation and Category |
|---|---|---|
| Extract Text | Extracts text and metadata from file based on the file IRI/path that you have specified. | extract_text |
| Extract Artifacts | Extracts artifacts from file based on the file IRI/path that you have specified. | extract_indicators |
| Get Backend Config | Get extraction backend configuration details based on the file IRI/path that you have specified. | get_backend_config |
NOTE: File Content Extraction solution pack includes the playbook: Extract and Process Text From File which can be used with any file indicator to extract its content, metadata and artifacts as illustrated in the following screenshot:

| Parameter | Description |
|---|---|
| File IRI/Path | Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator. NOTE: The file name is taken from the tmp directory. |
| Set Output Format to XHTML | Select to format the extracted text in XHTML |
The output contains the following populated JSON schema:
{
"metadata": {},
"extracted_text": ""
}
| Parameter | Description |
|---|---|
| File IRI/Path | Specify the file IRI/path where the file is located. File IRI is an attribute of a file which is in turn an attribute of an attachment or an indicator. NOTE: The file name is taken from the /tmp directory. |
The output contains the following populated JSON schema:
{
"extraction_result": {
"IP": "",
"CVE": "",
"MD5": "",
"URL": "",
"Host": "",
"SHA1": "",
"Email": "",
"SHA256": "",
"results": "",
"Filename": "",
"Filepath": "",
"Registry": "",
"unified_result": "",
"whitelisted_results": ""
},
"status": ""
}
| Parameter | Description |
|---|---|
| Verbose | Select to fetch verbose details. |
The output contains the following populated JSON schema:
{
"Parsers": "",
"MimeTypes": "",
"Detectors": ""
}
The Sample - File Content Extraction - 1.1.0 playbook collection comes bundled with the File Content Extraction connector. These playbooks contain steps using which you can perform all supported actions. You can see bundled playbooks in the Automation > Playbooks section in FortiSOAR™ after importing the File Content Extraction connector.
Note: If you are planning to use any of the sample playbooks in your environment, ensure that you clone those playbooks and move them to a different collection, since the sample playbook collection gets deleted during connector upgrade and delete.
Extract text action may sometimes fail even after adding the prerequisites. In such cases:
root user:
rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_your_connector_version/__pycache__
For example to clear the cache of the file content extraction v1.1.0, the command would be:
rm -rf /opt/cyops/configs/integrations/connectors/file-content-extraction_1_1_0/__pycache__/
uwsgi service using the following command as root user:
systemctl restart uwsgi