The top five pitfalls to avoid when implementing SOAR
I was recently in a room full of CISOs and the topic-du-jour was SOAR. The headline on the PowerPoint slide read “SOAR or SORE?” — a joke to get the conversation started.
Given limited budgets and the general shortage of experienced security talent, most of the CISOs in the room were already looking to automation to help bridge the gap between their operational reality and the unrealistic expectations of the rest of the business. But was automation really closing that gap? “No” was the popular response, which came amid a smattering of “not yets” and “it’s starting to.” It seemed that at least some orgs were beginning to implement systems that were automating workflows and streamlining security operations.
But is merely beginning to automate things here and there enough to say that you’re doing SOAR?
What is SOAR?
As it turns out, it all depends on how you interpret the definition of SOAR. Some say it’s not SOAR unless it can operate independently without human intervention. Others claim SOAR is a ticketing system, or case management system, but not the IT ticketing system, but maybe not a ticketing system at all. All of these definitions and interpretations can make your head spin. Whether or not you believe robots can replace humans entirely (we don’t, by the way), I don’t think these disagreements about definitions matter much.
For our purposes, let’s define SOAR broadly. It’s some set of technologies that helps you integrate the security tech you already have (and will add over time) and weave it together so it saves you time and helps your humans scale better. Yes, I understand that Python.org or /bin/bash probably both qualify as SOAR according to this definition. Let’s assume for a moment I’m not stressed out about that. A little history explains why.
But what really is SOAR?
Longer ago than I’d like to think about, I worked on the security team at America Online. At the time, we had to build much of our technology in house because what was available commercially (or even as open source) wouldn’t work at our scale “out of the box.” It’s the same sort of problem folks like Netflix and Google faced years later. So we ended up building middleware (pretty sure this is what SOAR was called back then) to streamline a variety of security workflows to support identity and access management, authorization management and incident response.
One of the initial challenges we had when trying to automate from point A to point Z was that the “business rules” kept changing under our feet. That made it hard to code the steps in between. So we built a generic system to automate the parts that were largely static, left a few key steps in the middle for humans and then technology picked the workflows back up again and finished the process. Here’s a quick IAM example: 1) let software collect information from managers about who needed access to what; 2) let the software provision access directly where APIs were available and 3) create tickets for humans where APIs were absent. Then, when all access was provisioned, handle the notifications and track the access over time to ensure it was disabled if not used in 90 days or if not renewed by managers after 365.
Overall this system worked well, but maintenance costs were heavy because systems across the company kept changing … which meant the APIs and underlying data models changed constantly too.
It should be called SORE
Walk into any development shop and the engineers are probably familiar with the phrase “it’s like changing a jet engine mid-flight.” It’s the age-old problem of introducing (or replacing) technology without damaging the business. And it’s HARD. Making significant changes to a technology platform that’s used heavily (or used at all) is a painstaking endeavor … and when the technology in question was built over time — “organically” — rather than as part of a well thought out architecture that could withstand future changes, well … most engineering teams find themselves drowning in tech debt.
So let me propose this: SOAR is not “orchestration and response.” Those aren’t the activities you’re doing when you implement SOAR. SOAR is SORE.
Jokes and conversations starters aside, it really should be “Security Operations and Response Engineering.” This is an engineering problem and should be treated as such.
What do I mean by “as such?” I’m glad you asked.
Five pitfalls when implementing SOAR
Because I’ve learned the hard way what not to do, I’m now sharing with you the five mistakes I wish I hadn’t made.
Pitfall 1: Automating everything
The firefighters were exhausted from putting out both small and large fires over the past week. Instead of jumping to automation as the solution, maybe it’s worth taking a look at why there are so many fires in the first place. When your city’s made primarily of wood and lumber yards are located on the banks of the river — which let fires quickly move from one side of town to another — you’ve got some pretty compelling reasons to make architectural changes before turning to automation.
SOAR is no different. Take the time to understand what’s driving the volume of your work and see if there are architectural changes or tuning you can do upstream in your security infrastructure before you automate.
Pitfall 2: Listening to your analysts
Just kidding — you should totally listen to your analysts. But evaluate what they say in the context of data. As a general rule, if you ask an analyst what to automate, they’ll describe an annoying time-consuming process they had to go through last Tuesday. What they won’t tell you is that it’s the only time they’ve had to do that this month. It’s a recent enough memory though and painful enough that they don’t want to have to do it again … so that’s what’s top of mind.
Beyond anecdotal recaps of something an analyst thought was tedious, you need metrics. As you figure out what to automate, metrics will help you make that decision and prioritize your engineering investments. Fixing an annoying workflow during an investigation might save one person a half hour once a month, but cutting one minute from a triage step (that nobody realizes they’re doing because it’s muscle memory at this point) could save everyone on your team a half hour a month. Good instrumentation and metrics management will help you figure out what to automate next (pro tip: check out tools like Datadog and Tableau to organize, visualize and analyze your data).
Pitfall 3: Building brittle integrations
I like to think about SOAR platforms as being measured best by TTP: time-to-Python. How much will your SOAR platform do for you before you have to write Python? It’s usually measured in minutes. Beyond lambasting the limitations of SOAR, though, let’s take a look at the software you write to achieve the orchestration you want.
If your security team is like most, you’re likely to swap out at least one technology in each tech category every four years or so. Maybe your SIEM (Security Information and Event Management) tech sticks around longer (even though you wish it wouldn’t). To avoid the pain of “rebuilding everything” each time to swap out a security product, you’ve gotta make one crucial investment — adding an abstraction layer between “analysis” and “security product.”
With an effective abstraction layer, you normalize data and queries across similar technologies. For example, one endpoint tech becomes no different from another upstream in the technology stack. Your analysts and your analytics can say “get me this file,” and your SOAR architecture will figure out how to do that with Tanium today — and it won’t skip a beat if you try to do it with Carbon Black tomorrow. Anything short of this and you’ve built a brittle integration that you’ll need to rebuild later.
While you’re at it, watch out for other areas that might be brittle. If you’re automating a process you don’t understand well… it’s liable to break readily. On the topic of things breaking–expect your processes, your technology, and even your people to fail from time to time. The automation you build needs to stand up to those failure conditions without creating more work.
Pitfall 4: Assuming you’re getting better
The old management adage goes like this: “What gets measured gets done.” If you really want to improve your orchestration and automation, it’s vital you know where you are today to figure out (and celebrate) as you improve. Some of this you’ll do through operational metrics that you’ve put in place as part of a security operations and response program. Fixating on this, though, could cause you to lose sight of the big picture.
Imagine for a moment you’ve built a security operations program that operates effectively but is optimized to find and stop nation-state attackers. That’s great if you’ve got other countries all up in your business every other week, but less effective when garbage spear phishing results in business email compromise every day. You need both capabilities and by fixating on just a subset of your metrics, you might be celebrating myopia.
Security operations, whether SOAR-enabled or not, operates in the context of the broader risk management environment. If you’re making conscious decisions across this broader scope, you’re less likely to over-invest in one capability at the detriment of another that you need even more. There are lots of ways to get this done, but we’re fans of the NIST Cybersecurity Framework. It’s comprehensive, helps guide your thinking, and it’s not hard to get started. As you continue developing your security program, take another measurement. Mixing internal assessments with less frequent external ones like NIST will ensure you’re seeing the forest through the trees and help you mitigate your own bias.
Pitfall 5: Getting comfortable
It’s rare to find a CISO who’s complacent. Most are perpetually on edge, somewhat (who are we kidding?) paranoid and wondering if today is the day everything goes down in flames.
Still, when you spend so much of your time making sure the plates keep spinning, it’s tough to take the time to inject yet more chaos into the system to see how the team handles it. Staying with the big picture theme, tabletop exercises are great ways to think through how you’d respond in the face of a real problem.
When you think about getting comfortable in the context of SOAR – realize that the automation you’ve built ages. The processes you’ve solidified into automation may have worked well when they were built… but as the business has changed around your implementation, do the same assumptions hold true? Or is it time to re-think the process and therefore the automation. One of the most effective ways to figure this out is through scenarios.
We run tabletop exercises every quarter at Expel and it never ceases to amaze me the breadth of interesting discoveries we make, far afield from security technologies, let alone SOAR. It really puts things in perspective. Still, who really wants to sit in a room and plod through boring and stressful scenarios?
Fortunately, we’ve got something that might help. If you enjoy games (especially D&D) and are willing to shake things up a little with your executive team, check out Oh Noes! It a security-focused tabletop exercise, D&D style. Bring some Doritos and you’ve got a social event and risk-management exercise in one.
Where do you go from here?
Avoiding pitfalls seems like a tough thing to do today. “Okay, I’ll watch out for those problems,” you might think. But we’re all on a journey as we look to improve what we’re doing from a day-to-day security perspective. Whether you’re in the middle of SOAR implementation or it’s still on the far distant horizon at your org, there are things you can do today to help you prepare or adapt.
- Take a measurement. Figure out where you are from a security program perspective.
- Inject some (healthy) chaos. Try Oh Noes! and entertain your team.
- Contemplate your metrics. Evaluate if you’ve got the right ones in place.
If you’re still wondering if SOAR is right for your org or how you might go about implementing it, let us know — we’d love to talk.