Thorium 101: Inside CISA’s Open Source Malware Analysis Platform

CISA’s new open-source malware analysis tool Thorium is designed for customization, safety, and real-world security team workflows. This post introduces its core features and how to get started.

Thorium 101: Inside CISA’s Open Source Malware Analysis Platform

Thorium is a malware analysis platform designed to simplify analyst workflows by consolidating tools into a single interface. CISA describes it as “a highly scalable, distributed malware analysis and data generation framework”. Sound familiar? We released a guide on a similar open-source malware analysis tool called Assemblyline earlier this year (see below for a comparison of the two). Thorium is available as a GitHub repo, which is maintained by CISA. This blog outlines what Thorium is and how to use it. 

At a Glance Comparison: Thorium vs. Assemblyline

Figure 1: Comparison of Thorium and Assemblyline.
💡
For more details about Assemblyline, read our blog Assemblyline 101 - Open Source Malware Triage

What is Thorium?

Thorium is an open-source malware detection tool designed to simplify incident response, triage, and file analysis by providing secure file ingestion and storage, along with automated analysis. The platform also makes it easy for analysts to access storage copies of analyzed files and their metadata. 

When to use Thorium?

Thorium is great for teams that conduct file triage or analysis. Being able to store the results of their analyses in place makes it easy for other analysts to review the results and quickly search for additional information.

Key Features

  • Scalable platform for analysis
  • Static and dynamic analysis sandboxes
  • Easy-to-use interfaces
  • REST APIs for automation purposes
  • Multi-tenant capabilities
  • Full-test search
  • Key/Value tagging for data labels

CaRT and safely sharing files

A common problem faced by all malware analysts and researchers is how to store and share files safely. Some do this by storing them in password-protected zip archives or by changing the file extension to something that renders the file unable to execute. In some cases, these work to prevent accidental execution of malicious files; however, they can still be quarantined by security solutions. 

This is where CaRT (which stands for Compressed and RC4 Transport) comes in. CaRT is used to store and transfer malware, as well as its metadata. This is a product developed by the CSE in Canada, encrypting files to prevent them from being executed or quarantined by security solutions. Any file uploaded to Thorium is converted to CaRT format. They are also downloaded in CaRT format and must be “unCaRTed” to be executed.

Thorium Architecture

Thorium was designed to run within a K8s cluster. For deployment on a cluster, CISA mentions that a Block Storage Provider and S3 storage will be required. CEPH is recommended for on-premise solutions.

Components

API

The rest API allows the different Thorium components to coordinate activity and complete tasks. The API allows multiple instances to run on various components. This setup enables high availability, meaning that if one server with the API running fails, the entire Thorium platform remains operational.

Different databases are used to store various types of information.

Database

Resources

Redis

Reactions

Scheduling Streams

Scylla

File metadata

Reaction logs

Elastic

Tool Results < 1 MiB

S3

All files

Tool request > 1MiB

Jaeger

API request logs

Scaler

The scaler is used to determine when reactions or jobs are created. This system enables priority jobs to be executed on a portion of the platform while the rest of the platform executes other jobs. 

Agent

The agent is used to run tools by:

  • Downloading the required data
  • Executing tools
  • Streaming logs via API
  • Uploading results
  • Cleaning up temporary artifacts

Reactor

The reactor uses the Thorium API to get information about its nodes and creates or terminates workers to match the workload. 

Tracing/Logging

Logging is performed by tracing. Unified tracing is used to log to a central file server or disk. 

Event Handler

The event handler is responsible for triggering reactions (instances of a pipeline being run) based on actions that took place within Thorium. 

K8s Deployment

ℹ️
This blog will not delve into the deployment of Thorium on a Kubernetes (K8s) cluster. 

The Kubernetes cluster requires a storage provisioner that can allocate persistent volume claims for the database and tracing services. Admins also need account credentials and permissions to create buckets within an S3-compatible storage interface. 

The components that make up the Thorium deployment include:

  • Traefik
  • Rook
  • Redis
  • Scylla
  • Elastic
  • Tracing
    • Quickwit
    • Jaeger

Detailed installation for each of these services is available in the GitHub repo here.

Local Deployment

Minithor uses minikube to deploy Thorium to a single instance. This type of deployment is not recommended for production use and should only be used for development and testing purposes. Minithor implementations do not provide the scalability factor that Thorium was intended to give teams. 

The content of the GitHub repo was used during the instructions below.

Deploy Miniduke

Install and run Miniduke by executing the script

./install-linux

The content of the install script is provided below:

#!/bin/bash

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube && rm minikube-linux-amd64
# set resources for VM
minikube config set cpus 8
minikube config set memory 15976
# alias kubectl command to minikube subcommand and add to .bashrc/rshrc
touch ~/.bashrc
if [[ $SHELL == *"bash"* ]] && ! grep -q "alias kubectl=\"minikube kubectl --\"" ~/.bashrc; then
    echo "alias kubectl=\"minikube kubectl --\"" >> ~/.bashrc
    source ~/.bashrc
fi
touch ~/.zshrc
if [[ $SHELL == *"zsh"* ]] && ! grep -q "alias kubectl=\"minikube kubectl --\"" ~/.zshrc; then
    echo "alias kubectl=\"minikube kubectl --\"" >> ~/.zshrc
    source ~/.zshrc
fi
# start of k8s cluster
minikube start --cni calico
# add required plugins
minikube addons enable csi-hostpath-driver
minikube addons enable ingress
minikube addons enable ingress-dns

Once MiniKube is installed, you need to create a file called .dockerconfigjson that contains the authentication credentials for the user account associated with the Thorium container. Following that, dependencies can be installed using the deploy script.

User Roles

Groups are used to grant permissions and access to resources for different users. All resources are owned by the person who creates them. 

System Roles

System Roles are used to perform actions at the global level. The roles include:

  • User
  • Analyst
  • Developer
  • Admin

The user role can use Thorium to conduct analysis, but cannot create new Pipelines. The developer role has more permissions than the user role. The admin roles allow users to modify any resource within Thorium. 

Figure 2: Description of each user role available within Thorium. Source: CISA

Group Roles

Groups are used to restrict a user’s ability to conduct operations on a group’s resources. Group resources include:

  • Images
  • Pipelines
  • Repos
  • Files
  • Tag
  • Comments
  • Analysis results
Figure 3: Permissions assigned to different roles. Source: CISA

File Origins

File origins are a feature within Thorium that can be used to describe any relationships that exist within the files.

Figure 4: Upload screen within Thorium. Source: CISA

The table below lists the various file origins available within the platform.

Type of File Origin

Explanation

Downloaded

The file was downloaded from a URL

Transformed

The file was obtained through transformations

Unpacked

The file was unpacked

Wire

The file was captured through the network

Incident

The file collected during an incident

Memory Dump

The file was obtained from a memory dump

Carved

The file was extracted from another file

PCAP

The file was extracted from a network capture

Unknown

The file was obtained from an unknown source

Figure 5: Subfields for each of the file origins. This subfields are used to provide insights for each identified file.

Images & Pipelines

Tools are referred to as Images within Thorium. As of September 4th, 2025, no tools have been released as part of the Thorium build; however, users and organizations can add their own tools as needed. Images can be created using the Developer role. 

Images can be added via the Web UI. This is done through the Image confirmation settings, where users can specify the name, Group, and Image required, among other settings. Once the relevant information has been entered, Thorium is able to use the tool and display its output.

Figure 6: UI used to create an Image: Source: CISA
Figure 7: Parameters used to configure Images.

Pipelines

Pipelines are a way to run multiple Images through in an automated fashion. These are used to run different tools and collect their outputs sequentially. They are created in the Pipelines creation menu, where users will provide a Pipeline name, the SLA, Image order, and Group.

Figure 8: Dialog shown when creating a Pipeline. Source: CISA
💡
Images must already be created for them to be used as part of a Pipeline in Thorium.

Thorium enables organizations to help scale and standardize their file triage and analysis steps. The ability to have tools contained within a single platform helps analysts with file analysis and management. Pipelines and their use in automating processes are an interesting concept that can be employed to perform static analysis on files using multiple tools, with the resulting information ready for review by a human analyst when required. 

Thorctl

Thorium is also accompanied by a command-line tool that can be used to perform different operations. The CLI tool can be used to:

  • Upload files or Git repos
  • Download files or repos
  • Start reactions
  • Download results
  • List files

Download instructions for both Unix and Windows are available here.

An example of the CLI usage is provided below:

$ thorctl files upload --help
Upload some files and/or directories to Thorium

Usage: thorctl files upload [OPTIONS] --file-groups <GROUPS> [TARGETS]...

Arguments:
  [TARGETS]...  The files and or folders to upload

Options:
  -g, --groups <GROUPS>            The groups to upload these files to
  -p, --pipelines <PIPELINES>      The pipelines to spawn for all files that are uploaded
  -t, --tags <TAGS>                The tags to add to any files uploaded where key/value is separated by a deliminator
      --deliminator <DELIMINATOR>  The deliminator character to use when splitting tags into key/values [default: =]
  -f, --filter <FILTER>            Any regular expressions to use to determine which files to upload
  -s, --skip <SKIP>                Any regular expressions to use to determine which files to skip
      --folder-tags <FOLDER_TAGS>  The tags keys to use for each folder name starting at the root of the specified targets
  -h, --help                       Print help
  -V, --version                    Print version

Conclusion

As a new entrant to the malware analysis space, Thorium provides cybersecurity teams with a highly customizable and free platform to enhance malware analysis, triage, and incident response workflows. By combining several essential capabilities in one tool, it helps analysts with organizational file sorting, storage, analysis, and collaboration. While still relatively immature, Thorium is a promising option to consider for teams seeking affordable and flexible solutions.

References

GitHub - cisagov/thorium: A scalable file analysis and data generation platform that allows users to easily orchestrate arbitrary docker/vm/shell tools at scale.
A scalable file analysis and data generation platform that allows users to easily orchestrate arbitrary docker/vm/shell tools at scale. - cisagov/thorium
Intro - Thorium