Pulsedive Op-Ed | How to Make STIX Stickier

As someone who has built import/export for STIX/TAXII from scratch, I think it can be improved in a few ways to reduce friction for vendors when implementing.

UUIDs as Identifiers

STIX objects must use UUIDs as object identifiers. In theory, this makes sense, as it ensures that two objects won't have the same ID across TAXII servers.

STIX object identifiers, from the STIX 2.1 docs

However, it also means that vendors, if they wish to implement them correctly, must generate and store the UUID strings alongside their objects. This is so that the UUIDs remain persistent for each object and can be queried later on using TAXII. If some other ID scheme is used (numeric IDs, hashes, unique strings, etc), UUIDs must be generated for all objects stored in the vendor's database, which may be time-consuming to execute and can take up additional storage space.

✅

Recommendation: Allow more flexibility for object IDs, while still requiring some level of uniqueness across TAXII servers (using a hash, server hostname, etc in combination with whatever convention the vendor may already be using).

Streamlining Objects

There are almost 20 different object types, many of them have overlapping attributes and serve similar purposes. They can probably be combined.

STIX Domain Objects, from the STIX 2.1 docs

At Pulsedive, intrusion-set, group, campaign, malware, and some others are all considered "threats" with a category string. This makes it really easy to parse threat data and enables a consistent UI.

Pulsedive considers actors, campaigns, malware, and vulnerabilities as "Threats," with a category string attached.

There are also cyber-observable objects, which are different. As technologies evolve, this may be an ever-growing list of object definitions. They are not required to implement, but some type of generic entity type/attribute/value dictionary is probably easier to define.

STIX Cyber-observable Objects, from the STIX 2.1 docs

✅

Recommendation: Consolidate object types with similar attributes and purposes.

IOC Handling

Indicator objects use a STIX pattern, which is a patterning language invented for STIX that represents activity which may be indicative of suspicious behavior. While it enables powerful granularity for detection, it also means that you not only have to parse out the STIX JSON data, but the indicator values out of a pattern string as well.

Observations come in the form of a pattern expression, instead of a raw value.

STIX pattern documentation

There may also be some confusion on whether the average IOC shop should put IPs/URLs/domains into cyber-observable objects, multiple IOCs in a single Indicator object pattern, or in separate Indicator objects. (We do separate Indicator objects at Pulsedive, but I've seen all 3.)

In theory, IOCs should be malicious or indicative of compromise (literally, an Indicator of Compromise). In practice, false positives and user-submitted observables destroy the utopian differentiation between "real" IOCs and just regular observables in infosec products. They're all "IOCs", with risk scores.

✅

Recommendation: Allow both raw values and patterning, and consolidate observables and Indicator objects.

Pagination and Timestamps

A huge benefit of TAXII is paging through an infinite number of objects. There are multiple ways to do it, using a "next" parameter or "added_after" to flip through by timestamps.

From the TAXII 2.1 docs

If you have many objects with the same timestamp, you must use the "next" parameter in tandem with the "added_after" filter. Let's say you have 100 objects with 2020-01-01 00:00:00 but your page limit is 10. Using "added_after", you will miss 90 objects if you skip to the next timestamp.

A great benefit to TAXII paging is that the "next" parameter can be whatever the vendor chooses, so there is a lot of flexibility to support whatever data model the vendor uses.

✅

Recommendation: Simplify paging guidance.

Limited Library Support

Personally, I haven't seen many libraries for interacting with STIX/TAXII in languages other than Python. Vendors across the security industry use a wide variety of languages and technologies, including PHP, JS/Node.js, C/C++, and Rust, to build their solutions. This means that if they wish to implement STIX/TAXII in their offerings, they would either have to interact with a separate Python component or build it from scratch (as we did).

✅

Recommendation: Invest in broader language support.

Closing Thoughts

STIX/TAXII largely standardized threat intelligence sharing and is definitely needed in the industry. But if you are wondering why adoption has been slow and inconsistent by vendors, it's because it can be somewhat tedious to implement and a little complicated to consume. Plus, library support is limited in languages other than Python.

STIX/TAXII was a huge step forward in the field, and I am looking forward to continued improvements in future versions.

To take advantage of Pulsedive's STIX/TAXII implementations across our free and premium services, check out our API docs.