Avoiding buyer’s remorse in the cybersecurity data analytics marketplace

By: Jon Hawes

June 4, 2019


News -

Today, lots of cybersecurity vendors are making bold claims about how their technologies use Artificial Intelligence (AI) and /or Machine Learning (ML) to detect threats and improve the efficiency of security operations teams.

Marketing for these products usually takes one of three positions to present a value proposition to potential buyers.

“Your Security Operations team is overwhelmed with alerts, most of which don’t matter. Our ML / AI auto-magically finds the alerts that do matter, so that your team can just focus on events that have a very high probability of being malicious, and ignore the noise.”

Vendors taking this position usually need to plug into an existing technology stack, and make use of the data security teams are already a) collecting from multiple sources (e.g. network, device, access, application), and b) centralising in a platform for correlation. This may be a commercial SIEM (Security Incident and Event Management platform), or an open source stack such as ELK or Hadoop. The ‘burning problem’ these vendors focus on for ‘product-market fit’ is the human effort (hence time and cost) it takes to investigate and triage the high number of false positives from rules-based correlation alerting – and the fear of missing the meaningful alerts in the noise. Because they do not directly collect telemetry or logs, they are reliant on the upstream quality of what the security team has centralised.

“Old technology in <specific category blah> has failed to keep pace with detecting new and evolving threat tactics. Our next-generation technology uses ML / AI to detect new tactics by using <supervised / unsupervised> learning.”

Vendors selling on this basis usually have a technology that installs at a single specific layer of network, devices, or applications. They then apply ML models to telemetry to generate alerts. Vendors in this camp split into two main types.

First, anomaly detection. In essence, something changes, it looks vaguely ‘threaty’, an alert is generated. The problem with anomaly detection is it turns out lots of things can appear malicious, when in fact they are benign activity. Welcome to lots of tuning.  

Second, threat pattern detection. Here, a vendor requires a number of signals of threaty activity to ‘stack together’ before they alert. This generates less alerts and false positives, but there’s also a fine balance in the ML models about what is tuned out vs interpreted as a signal.

These vendors usually look to replace a discreet existing technology that either needs a lot of maintenance (e.g. updating and tuning rules), or has a high false positive ratio, which makes it impractical to use as a source you correlate with other data sources. The pain they offer to take away is the requirement for lots of specialist analysts to stare at single source dashboards, and then go to other systems’ dashboards when an alert fires. They promise ‘single source’ alerting that is high fidelity in terms of ‘must investigate as high chance of threatyness’.


“We provide a holistic risk score for <users / devices> so that you can continuously measure threatyness level, and automate actions based if thresholds are crossed!”

Vendors offering risk scores and continuous monitoring usually take data from several sources (e.g. user access logs, network traffic logs such as web proxy, and maybe application logs). They then analyse patterns across these sources and flag if the patterns suggest either a) the user account has been compromised by a threat, or b) the user themselves is acting maliciously.


The reality in this market is that much of the logic is not an ML model, but a rules based system. There may be some ML used for anomaly detection in the mix, but because of the need to be able to easily and rapidly change and configure thresholds, and the reality that lots of different populations of employees act completely differently, so one size doesn’t fit all.

The pain these vendors are focused on is the need for context around alerts and decisions (i.e. timelines of activity, change over time, and the need to get a baseline of activity to understand what ‘normal’ looks like before configuring and tuning alerts.

While vendors are good at messaging about the problems we face, that doesn’t mean that they have identified, or built, a good way to solve them. As a consequence, buyers in this marketplace face multiple challenges when trying to assess the level of value a technology that uses (or claims to use) ML can deliver.

Most vendors who say they use ML or AI as part of their product provide neither a clear guide about the definitive results their product can and cannot deliver, nor evidence about where their value in threat detection starts and stops. This means unless you have a domain expert on your team who can thoroughly test a product against your requirements (and do so in the reality of your production environment), you cannot understand the trade-offs that a vendor has made as part of their product development life-cycle, and what your experience will be in managing the signals from that technology to prioritise your security operations resources.


The good news is that for the dimension of ‘ability to detect threat tactics’, much of the industry is moving towards using a common framework called Mitre Att&ck to compare the coverage vendors have across attack techniques. While this is very helpful, the not so good news is that there are many more areas concerning features, functionality and API usability that buyers need to understand. For most of these, there is no practitioner-developed industry framework available to work out how ‘stage appropriate’ a technology is for your company and the maturity of your security function.

When you get a demo of a technology, you’ll usually be shown how a security analyst can interact with a user interface. The demo focuses on the state of nirvana that your teams will be in once the product is deployed, configured and fully operationalised. The agony of the process to get there is glossed over. It’s really difficult, unless you have contacts in other companies that have deployed a technology successfully, to gauge how much effort will be needed for the design, tuning and workflow development phases of an implementation, in order to deliver strong value. In turn, this means there can be a big lag before you can start taking advantage of any promises of automation.

This isn’t unique to security by any means, but when it comes to using ML models to achieve improvements in detection efficacy, you need to know how much up-front work is required, how specialist that work is likely to be, and what kind of impact will be delivered for how much work. It’s not unusual to hear stories of ML-based detection technology taking 6 months or more to operationalise, with ongoing tuning required beyond that.

So how can we navigate this market as buyers, in order to reduce the chances of remorse on our investments?

First, do you even need ML / AI to solve a real problem that your security operations team have right now? Are there burning issues where periodic manual analysis of a few high value data sources can deliver the same, or more value? And once you have the output of some analysis, are there workflows in place to act on the output? Figuring out whether you can solve a problem in a spreadsheet before using vendor analytics is a good benchmark for what problems you have, how complex they are, and how easy they are to solve with your current workflows.


Second, what data do you have available across your estate, and is it what you need to detect either the tactics of credible threat actors that you face, or actions in your environment that may lead to a level of financial loss that is unacceptable for your business? Whether you need to feed vendor ML models with data, or you need to correlate their output with other data sources for investigations, understanding what you have access to today is essential. Here’s some helpful questions to ask about the sensors you have available (i.e. technology that generates telemetry, logs and alerts that you can use to feed data analytics technologies and processes) before you engage with any vendor:


1 – What sensors do we have across our various technology environments?

It’s not only important to know what you have, per environment, but also if you have a lot of entropy across on-prem, cloud, and managed service provider environments, as this will mean different tech stacks to manage.


2 – For a specific category of sensor, what is the level of variation across environments?

It’s not unusual in large enterprise environments that grow by acquisition to have several different vendor technologies doing the same thing (e.g. 3 different on-prem anti-virus vendors, all with different licence end-dates).


3 – How will our sensor eco-system change over time?

Any plans to rationalise technology, (either to achieve cost savings or make vendor management easier), will impact the data you have and the transformations you may need to do to that data. This is because most vendors have at least slightly different formats for data fields, (even common ones like dates).


What data are sensors generating currently?

This is the available data output (overall format and fields within that format), based on the current licenced modules you have, and how they have been configured.

In what formats is data available for a) export and b) transport?

You’ll want to know if you can ship data from a technology in streaming or batch, any restrictions on API calls, and so on.

How consistently is data generated, with what coverage across environments?

(There may be coverage gaps or technology failures that mean there are gaps in data history; knowing what monitoring is in place for data quality is important)

How long is data stored locally per sensor?

Before thinking about shipping data to a collector or cental storage point, it’s important to know how frequently, or at what storage volume, logs get overwritten.

What are the options to improve configuration per sensor type, and or augment modules?

There’s often a lot more juice you can squeeze out of technology you already have, by optimising the configuration of existing modules. Once you’ve done that, you can evaluate purchase of extra modules if it makes sense.


What are the change management costs and considerations to do that?

Costs of change are not only the purchase of a new module, but the effort required from relevant teams to implement and test those changes for any production impact.


What network restrictions are there in shipping data (e.g. from source to collector to central storage)?

If you want to correlate a data source from one sensor with another, what are the limits you may hit in terms of volume, and what might that mean for data filtering that you have to implement?

Armed with this information, you’ll now be in a position to run quick up-front assessments of vendor technologies with questions that determine how they can help solve the problems you’ve identified.

What attack surfaces do you cover?

This tells you the scope a vendor is aiming to cover, and where they can help you detect and respond to attacks by threat actors. Machine learning is best applied to ‘bounded problems’. So generally, the wider the scope, the more machine learning models the vendor will need, and the more complex their models will be for them to manage. This means there is more likelihood of either false positives, or ‘missed detections’, or challenges to update models.

  • What attack techniques do you cover on those surfaces, and what stage of the Mitre Att&ck framework do they correspond to?
  • What are the specific detection problem sets that you solve for the above?
  • What are their strengths, weaknesses and limits?
  • What are the high and low bars for false positive rates in customer deployments and why?

Vendor responses to the questions above will show you how granularly they think about attacks, and this enables you to question the efficacy of particular models for certain detections that are of concern to you, based on credible threat actors that your business faces. You’ll also get a feeling for how open the vendor is about their results in other client deployments, and whether they focus on understanding those metrics to drive improvement in their product roadmap for their data science teams.

  • What specific data sources, and what fields within those data sources, are vital to solve the detection problem sets you focus on?
  • What is the ideal vs bare minimum sufficient data sets you need to deliver value?
  • What are your dependencies on modules, configuration and settings of other technologies that generate data your technology either needs to, or can, consume?

It’s vital to establish what prerequisites you need to work on, (if any) to get the most value from a Proof of Concept (PoC). If a vendor doesn’t have a very clear list of these, then it’s a red flag. There are often significant dependencies in data analytics on downstream or upstream technologies. The closer you can get during a PoC to understanding what a technology delivers out of the box, with no extra work on your part, then the more you can focus on the effort it takes to configure that new technology, and testing it to see what it does and does not detect. Time wasted during PoCs on sorting out dependencies means your team gets less hands-on time putting the vendor you are assessing through their paces.

  • What visibility do you provide of the detection logic available in your technology (i.e. rule sets / machine learning models)?
  • How do you help us understand the rationale for your choices in how you’ve applied that logic?
  • What is the process for tuning detection logic and how adaptable is that for your customers?

You want to know how flexible the technology is when it comes to adding your own detections. Some vendors do not allow you to tune or add custom detections. This means you have to have complete faith in their efficacy. Others allow custom detections but it can be a lot of effort to implement them. You’ll also want to look at how you can tune out noise from machine learning models, and what that means you may miss. This is especially the case where vendors use anomaly detection (‘something is different’) vs attack profile identification (‘these things combined looks like a malicious actor is doing something’).

  • How and in what formats are customers accessing and extract data you process or generate?
  • What are the most effective detection playbooks used today by your most advanced customers?
  • How long will it be before we see value from a test of your technology, how do you define the dimensions of ‘value’ we should measure in, and what are the conditions that must be true for you to stand behind delivering that in ‘n’ number of days?

What are other customers doing with the data output from this technology? How are their teams changing the way they operate? Answers to these questions will give you a feel for where you should be focusing when you request 3-4 reference calls for customers with a similar problem profile to you. It’s always interesting to ask how many of the customers are API-only consumers of the data a technology outputs, as this tells you how much development focus the vendor probably puts on making their APIs resilient and easy to query.

With what you know about your sensor environment, you can now evaluate if the technology you are assessing plugs a gap you have, will improve process efficiency, could replace another technology, adds another level to your existing defences or can achieve several of these outcomes. It’s not an easy task to do this, or particularly quick. But investment in this kind of process is essential to making sure that you get a good cost-to-value ratio from your spend on technology – and it’s far cheaper in the long run than buying something that sits on a rack (or in a cloud), which your security team can’t use to solve the problems they face.

This article was contributed by Jon Hawes, who will be speaking at this years Amsterdam event.