Op-Ed: How to Make STIX Stickier
As someone who has built import/export for STIX/TAXII from scratch, I think it can be improved in a few ways to reduce friction for vendors when implementing.
UUIDs as Identifiers
STIX objects must use UUIDs as object identifiers. In theory, this makes sense, as it ensures that two objects won't have the same ID across TAXII servers.
However, it also means that vendors, if they wish to implement them correctly, must generate and store the UUID strings alongside their objects. This is so that the UUIDs remain persistent for each object and can be queried later on using TAXII. If some other ID scheme is used (numeric IDs, hashes, unique strings, etc), UUIDs must be generated for all objects stored in the vendor's database, which may be time-consuming to execute and can take up additional storage space.
Streamlining Objects
There are almost 20 different object types, many of them have overlapping attributes and serve similar purposes. They can probably be combined.
At Pulsedive, intrusion-set, group, campaign, malware, and some others are all considered "threats" with a category string. This makes it really easy to parse threat data and enables a consistent UI.
There are also cyber-observable objects, which are different. As technologies evolve, this may be an ever-growing list of object definitions. They are not required to implement, but some type of generic entity type/attribute/value dictionary is probably easier to define.
IOC Handling
Indicator objects use a STIX pattern, which is a patterning language invented for STIX that represents activity which may be indicative of suspicious behavior. While it enables powerful granularity for detection, it also means that you not only have to parse out the STIX JSON data, but the indicator values out of a pattern string as well.
There may also be some confusion on whether the average IOC shop should put IPs/URLs/domains into cyber-observable objects, multiple IOCs in a single Indicator object pattern, or in separate Indicator objects. (We do separate Indicator objects at Pulsedive, but I've seen all 3.)
In theory, IOCs should be malicious or indicative of compromise (literally, an Indicator of Compromise). In practice, false positives and user-submitted observables destroy the utopian differentiation between "real" IOCs and just regular observables in infosec products. They're all "IOCs", with risk scores.
Pagination and Timestamps
A huge benefit of TAXII is paging through an infinite number of objects. There are multiple ways to do it, using a "next" parameter or "added_after" to flip through by timestamps.
If you have many objects with the same timestamp, you must use the "next" parameter in tandem with the "added_after" filter. Let's say you have 100 objects with 2020-01-01 00:00:00 but your page limit is 10. Using "added_after", you will miss 90 objects if you skip to the next timestamp.
A great benefit to TAXII paging is that the "next" parameter can be whatever the vendor chooses, so there is a lot of flexibility to support whatever data model the vendor uses.
Limited Library Support
Personally, I haven't seen many libraries for interacting with STIX/TAXII in languages other than Python. Vendors across the security industry use a wide variety of languages and technologies, including PHP, JS/Node.js, C/C++, and Rust, to build their solutions. This means that if they wish to implement STIX/TAXII in their offerings, they would either have to interact with a separate Python component or build it from scratch (as we did).
Closing Thoughts
STIX/TAXII largely standardized threat intelligence sharing and is definitely needed in the industry. But if you are wondering why adoption has been slow and inconsistent by vendors, it's because it can be somewhat tedious to implement and a little complicated to consume. Plus, library support is limited in languages other than Python.
STIX/TAXII was a huge step forward in the field, and I am looking forward to continued improvements in future versions.
To take advantage of Pulsedive's STIX/TAXII implementations across our free and premium services, check out our API docs.