Assemblyline 101 - Open Source Malware Triage

Learn how to install and use Assemblyline, the open-source malware triage tool. This 101 includes an overview, deployment walkthrough, example use case, and resources.

Assemblyline 101 - Open Source Malware Triage

While analysts can individually analyze files, that process takes time and may require a plethora of tools. Having a single source that provides an automated approach to initial analysis and detection mechanisms allows analysts to sift through noise and focus on files that require more attention. This is where Assemblyline, an open-source tool created by the Canadian Center of Cyber Security (CCCS), comes in. Assemblyline allows files to be scanned with various tools (called ‘services’) within the platform and for information about the files to be collected in one place. This blog explores:

  • What is Assemblyline?
  • Installing Assemblyline using Docker
  • Maldoc analysis using Assemblyline

What is Assemblyline?

Assemblyline is an open-source malware detection tool that allows cybersecurity analysts to triage files within a single platform quickly. The tool consists of different modules called services that collect information about the file and can be used to alert on suspicious artifacts. The key benefit of a tool like Assemblyline is that it tags submissions with results from services as it is being analyzed and can detect duplicate submissions. Moreover, the tool assigns a score to each file based on the information collected. This score can be used to identify malicious files or files that may warrant further investigation. 

Who should use Assemblyline? Assemblyline is ideal for security research and defense teams, threat researchers, and incident response professionals who need to automate and streamline the analysis, classification, and prioritization of malware samples. It is especially helpful for security teams handling large volumes of malware and seeking a scalable, customizable solution for efficient triage.

Services Available within Assemblyline

Services are modules available within Assemblyline that analyze the submitted file and extract items that may indicate maliciousness. Services fall under two categories: Assemblyline services and community services. 

  1. Assemblyline services are services or modules bundled with the Assemblyline build and are maintained by the Assembyline development team. 
  2. Community services have been created by the community to augment existing functionality. 

Assemblyline Services

The table below contains an overview of some of the services maintained by the Assemblyline team. The complete list, along with links to the service manifests, are available here.

Service Name

Description

Batchdeobfuscator

Deobfuscates batch files

CAPA

Tool that identifies capabilities in executable files

ConfigExtractor

Extracts malware configurations including list of C2

CAPE

Sandbox for dynamic malware analysis

DeobfuScripter

Static script de-obfuscator

Floss

Extracts obfuscated strings within files

Oletools

Tools that extract data from OLE and XML documents

PeePDF

Python based tool to analyze PDF files

Suricata

Network based detections for scanning network captures

YARA

Create detections based on patterns within files

Additional Community Services

The following community services are listed within the Assemblyline documentation

Service Name

Description

Author

Link

AutoItRipper

AutoIt Unpacker

NVISO

Link

ClamAV

Submits a file to ClaimAV and displays the result

NVISO

Link

MalwareBazaar

Retrieves MalwareBazaar Result

NVISO

Link

MsgParser

MSG Extractor

NVISO

Link

MetaDefender Sandbox

Submits file or URL to MetaDefender Sandbox

OPSWAT

Link

PythonExeUnpacker

Python exe unpacker

NVISO

Link

StegFinder

Uses StegExpose to identity data embedded in images

NVISO

Link

Unfurl

Expands a shortened URL

NVISO

Link

UrlScanIo

Submits data to URLScan.io

NVISO

Link

Windows Defender

Windows Defender Service

Adam McHugh

Link Unavailable

💡
Note: The GitHub Repos for services created by NVISO indicate that they have been archived and may not be under active development. As such, they may refer to older versions of Assemblyline.

Details on how to build a community service are available here.

Installing and Configuring Assemblyline

💡
Assemblyline is also available on AWS as a subscription that can be deployed with minimal user interaction.

Assemblyline can be deployed on a single instance or in a clustered environment. The way in which a team chooses to deploy Assemblyline depends on its objectives. CCCS claims that both deployment mechanisms have the same analysis capabilities, but clustered environments scale better whilst offering redundancy and failover capability. 

Certain external resources will need to be run on external sources.

Figure 1 below compares the features of the different deployment mechanisms:

Figure 1: Feature Comparison of the different Assemblyline Deployment Mechanisms. Source: Installation Manual

Installation Steps

The instructions below are from the Docker Installation Guide for Assemblyline:

  1. Install Docker
  2. Configure Docker to use a larger address pool
  3. Setup Assemblyline
  4. Deploy Assemblyline

1. Install Docker

sudo apt-get install -y apt-transport-https ca-certificates curl gnupg software-properties-common
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update -y
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo ln -s /usr/libexec/docker/cli-plugins/docker-compose /usr/local/bin/docker-compose

2. Configure Docker to use a larger address pool

Create/Edit /etc/docker/daemon.json and the following line:

  "default-address-pools":
  [
    {"base":"10.201.0.0/16","size":24}
  ]
}

Restart docker with service docker restart
Check status of docker - service docker status

3. Setup Assemblyline

Download the Assemblyline Docker Compose files:

mkdir ~/git
cd ~/git
git clone https://github.com/CybercentreCanada/assemblyline-docker-compose.git

There are two types of deployments:

  1. Assemblyline only
  2. Assemblyline with ELK monitoring stack

The ELK monitoring stack can be used to track Assemblyline metrics.

Figure 2: A Kibana dashboard used to display metrics of an Assemblyline instance.

The deployment steps are the same, except for which directory gets copied into the folder. The minimal_appliance directory will only setup Assemblyline while the full_appliance directory will also setup ELK for monitoring:

💡
In this walkthrough we deployed to the directory ~/deployments; users can deploy to other directories, given that the file system has sufficient space for the installation.
mkdir ~/deployments
cp -R ~/git/assemblyline-docker-compose/minimal_appliance ~/deployments/assemblyline
cd ~/deployments/assemblyline

To deploy Assemblyline and the ELK stack for metrics use the code snippet below:

mkdir ~/deployments
cp -R ~/git/assemblyline-docker-compose/full_appliance  ~/deployments/assemblyline
cd ~/deployments/assemblyline

This will move config files into the deployment directory ~/deployment/assemblyline. The config/config.yaml file is pre-configured for use with docker-compose and the .env file contains all the default passwords.

4. Deploy Assemblyline

  • Create a SSL Cert:
openssl req -nodes -x509 -newkey rsa:4096 -keyout ~/deployments/assemblyline/config/nginx.key -out ~/deployments/assemblyline/config/nginx.crt -days 365 -subj "/C=CA/ST=Ontario/L=Ottawa/O=CCCS/CN=assemblyline.local"
Figure 3: The private key is generated and saved in the config directory.
  • Pull the required docker containers:

Use the commands:

cd ~/deployments/assemblyline
sudo docker-compose pull
Figure 4: Docker containers being downloaded.
  • Build the docker containers using sudo docker-compose build
Figure 5: Results of the docker-compose build command
  • Pull services using docker-compose -f bootstrap-compose.yaml pull
⚠️
This step may take some time as all the features are downloaded.
Figure 6: Assemblyline Services being downloaded
  • Once all the services have been pulled, the service can be launched using the commands:
cd ~/deployments/assemblyline
sudo docker-compose up -d --wait
sudo docker-compose -f bootstrap-compose.yaml up
Figure 7: Launching Containers

Once all the services have been created, the console will output the list of services that have been launched along with the docker IDs.

Figure 8: Docker Instances being Started.
Figure 9: Terminal output showing services starting successfully.

Once the docker containers have fully been stood up, the services are up and running and can be accessed through the GUI. The web interface should be accessible on 127.0.0.1:443 using the default credentials specified in the .env located in ~/deployments/assemblyline.

If the web interface is not reachable through that address, check the logs to ensure that services are up and running and check the docker process using docker ps to see which port is being used by the nginx frontend.

Figure 10: Result of the docker ps command showing that the Nginx Web Server is available at 127.0.0.1:443
Figure 11: Login Page for the Assemblyline Web UI.

Updating a Dockerized Assemblyline Instance

A Docker deployment of Assemblyline can be updated using the following commands:

cd ~/deployments/assemblyline
sudo docker-compose pull
sudo docker-compose build
sudo docker-compose up -d

Checking Logs

Assemblyline logs are separated into logs for the core components and logs for specific components. 

For the core components:

cd ~/deployments/assemblyline
sudo docker-compose logs

For specific components:

cd ~/deployments/assemblyline
sudo docker-compose logs ui

MalDoc Analysis Example

One of the benefits of Assemblyline is that it keeps the results of multiple analyzers in one place, making it easy for analysts or responders to review results. In this example, we upload a Word document that uses remote template injection to download additional payloads.

The Word document is an agreement for enterprise services. When opened the file will connect to the hardcoded url in the relationship file _rels\document.xml.rels and load content from there. The hardcoded URL is used to load an RTF file from an adversary-controlled domain as shown in Figure 13.

Figure 12: Content of the Maldoc. Source: Triage
Figure 13: Content of the _rels\document.xml.rels. The malicious RTF file is highlighted in yellow.

Once a user uploads the file to Assemblyline, it starts the analysis process where each service runs against the file and results are collated. The verdict is updated at the end of the analysis based on the information returned by the services. 

Figure 14: Submission results of the malicious document once analysis is completed indicates the file is malicious.

Each submission is given its own unique identifier and the submission information shows details about the analysis features that have been selected. Users can choose which services to use in an effort to speed up scans and adjust priority.

Under the Submission Information section, is Heuristics which outlines the results of the analysis.

Figure 15: The Heuristics section indicating a known IOC was identified and content within the XML files were identified as suspicious.

Here, the services identify an IOC that is part of a blocklist. Clicking on any of the heuristics will provide more information about the finding. In the case of Badlisted IOC, the results show that a domain that was within the document was part of the threatview.io domain blocklist. 

Moreover, OLETools identify an external relationship within the document. The service identifies a hardcoded URL that would be used to establish a connection to a malicious domain. 

Figure 16: The OLETools Service identified the IOC shown in Figure 13 within the XML files that make up the Word document.

Potential indicators of compromise are shown in a separate section on the submission page. The ‘Indicators of Compromise’ section can be used to quickly see any IPs, domains, or hashes related to the submitted file. In this example, the IOCs include the URL identified by OLETools and its domain. The tool also identifies several Microsoft URLs, but color-coded them green to indicate that they are not malicious.

Figure 17: All identified IOCs are color-coded based on their reputation. Green indicates that the value is likely benign.

When a user clicks on a particular IOC, the associated file will be highlighted. This can be used to streamline manual analysis flows by pinpointing which file a user should look into. Furthermore, the ’Files’ section highlights all the files identified within the sample. In our example, the Word document contains several XML files, one of which Assemblyline has flagged as malicious. Each file is extracted and run through the services for individual analysis. The results for these files can be viewed by clicking on the filename under the Files section.

When we dig deeper into the extracted file named 9177f499.xml we see where it originated from using the ancestry service. This tree illustrates the relations to the original file submitted and any services that generated findings.

Figure 18: The Ancestry service showing the relation of the XML file to the submission file.
Figure 19: Each service is run on the extracted files. The FrankenStrings service identified several URLs within the XML file.
Figure 20: The IOC identified in Figure 13 appears in a TI vendor

Assemblyline helps reduce the number of benign files that investigators spend time analyzing during the day. By running files through an automated pipeline of services, investigators can get a sense of what the file is doing prior to manual inspection and prioritize threats more effectively. This process, combined with the fact that the submission resultsare stored on a central platform, allows for the platform to serve as a single triage source for samples. For additional resources on Assemblyline and its capabilities, check out the references below.

References