Boston Waterfront - April 2022

The Tar Pit of CSPM

It’s been a little less than five years since I moved from a media production cloud nerd to a cloud security nerd. As I ponder what I’m going to do next, I want to reflect on some of the things I got right and some that didn’t work out as expected.

[ Update I use the term CSPM but never officially define it. In this case, I’m using it to mean Cloud Security Posture Monitoring - there are some who define CSPM and Cloud Security Posture Management, but this post is about detecting issues, not fixing them. I discuss auto-remediation in this post ]

The Tar Pit of CSPM

When I joined the Turner security team in 2017, they had a primitive CSPM tool as part of a primitive CWPP (before either of those acronyms were coined). The CSPM was a regurgitation of the CIS Benchmark circa 2015. It had to be installed by account admins by hand. The vendor provided a pdf with the screenshots and json for creating the cross-account role. This cloud security company hadn’t even figured out how to provide a Cloudformation Template! I had to write one. It was a mess and the biggest pain point I had with our security team prior to joining them.

So naturally, fixing CSPM became my first priority. I wrote a cloud security standard, so we had a policy cover for ignoring CSPM findings not covered by the standard. That was well-received by account owners and development teams because it was the first time we provided prescriptive guidance to teams on what to do. Part of that standard also required every cloud account to have both an Executive and Technical owner. We explicitly stated: “the Executive Sponsor is accountable to Security & Finance for all the activities in the account”. Defining this “bubble of accountability” at the cloud tenant level allowed us to get security issues, either potential incidents, vulnerabilities, or misconfigurations, directly to someone empowered to address them.

We also killed that legacy CSPM and brought in an open-source one. CloudSploit went well beyond the basics and implemented checks for the latest AWS services. Moving to open-source also allowed us to scale our cloud footprint without going back for more money every few months. We took the CloudSploit engine and an amazing intern turned them into Excel Spreadsheets. We had account owners getting a list of risk-aware (and usually actionable) cloud issues every week. We quickly saw some teams engage in fixing things and even reaching out for security advice.

Everyone talks about how social media is bad for your mental health, but what about Excel?

About a year after the merger of Turner, HBO, and Warner Bros. the focus was extending the existing program to the other WarnerMedia companies. Lots of outreach to teams to explain our program and what we were trying to do—deploying our visibility into new environments—trying to identify the bubble of accountability for all the cloud accounts.

I decided to leave once AT&T decided to vampirically suck the lifeblood from WarnerMedia as a whole and the WarnerMedia security team in particular. So in the middle of the pandemic, I jumped to another media company. One that would become very ironic about eight months later.

I replicated the Turner program into Discovery. I wrote them a cloud security baseline. My team and I implemented CSPM based on that baseline, and then we started sending out weekly scorecards. We deployed Antiope and worked with the cloud teams to identify the executive & technical owners for all the accounts. Having done this before, I had naive hopes that we’d push out CSPM and quickly move on to bigger and better things like IaC Pipeline scanning, auto-remediations, etc. Instead, CSPM morphed into security theater and making dashboards look good for auditors.

CSPM only covers one aspect of cloud security risk. There is a large aspect of cloud security that can’t be captured by simple API calls to the cloud provider’s API. CSPM needs to be linked to a robust cloud-aware application security program. CSPM can detect a security group that exposes RDP to the world. It can’t easily detect the segmentation and routing configurations of a thousand or more VPCs created over several years by dozens of different cloud teams. CSPM doesn’t build a detection strategy for compromised applications and account credentials.

Nor can CSPM help with the problem of “shift-left”. CSPM lives at the right end of the spectrum. On paper, detecting cloud security misconfigurations in IaC is easy. The reality is, in a large organization, it’s nearly impossible. First off, not all teams even use CI/CD and IaC. Sometimes ClickOps is the right business decision.

The second problem is that IaC and CI/CD aren’t standardized. “Tell a developer to close a security group and they will. Tell a developer to change their CI/CD or IaC to prevent a security group ever getting opened, and you’ve made an enemy for life”. For security to engage in the religious wars between CloudFormation, Terraform, CDK, and Pulumi wasn’t an option when I had only a small team and the enterprise had no overarching enterprise engineering mindset.

My greatest regret was that after getting CSPM into a risk-based approach and identifying how to make owners accountable, I could never move beyond that. The sheer number of CSPM findings made helping teams close issues, troubleshooting the CSPM tool, or coming up with better ways to present the data a full-time job for myself and my respective teams. The goal stopped being to make cloud security better and became “make this leader’s metrics look better”.

Sisyphus how it started how its going

Every requirement of a security team on the business is like a tax. It slows things down and makes us less competitive. If you can’t articulate the risk, you don’t understand the risk, and if you don’t understand the risk, you’re engaging in security theater and taxing the business for no real gain.

Except in a few small pockets of the organization cloud security never moved beyond a basic vulnerability management function. Where pocket did move beyond a VM function it was the teams that moved to auto-remediation and GuardRails, not CloudSec.

[ Check out the follow up post for my Philosphy of Prevention ]