AWS re:Play 2018 - Skrillex

Rethinking Config

A few folks have asked “How does Antiope differ from AWS Config”? Darn good question. I had looked at Config back in 2015 or so, and found it to be not that useful. If I dug around enough in the Console I could figure out who made a change, but honestly I have CloudTrail for that. When I took on the Cloud Security role I briefly looked at is with an eye towards “We should enable this everywhere, and dump the data to S3 in case someday we needed it for an investigation”.

Config wasn’t useful for the primary use-case I had. That being, “In which of our 250 accounts, 15 AWS regions, and 300+ VPCs does the IP address 10.20.35.40 live?” Or “Does that S3 Bucket actually belong to us, and if so which account?”

At the scale my SOC deals we need to narrow things down by account & region pretty quickly. And it was very easy to fire off a heard of lambda to assume-role into each AWS account to go and gather up that info, dump it into json and let Splunk index the data. That was the prototype of the system that became Antiope.

Config had other issues too. At US $2 per-rule per-region, a single open-security-group check would cost us $90k/yr! There are about 45 different rules we use for the scorecards. I’m pretty sure I’d be fired if I upped our AWS bill by $4 million and change.

Since 2015, AWS has made some improvements to Config. You can now federate Config searches across accounts. Rules are still $2 a pop, but I hear that is going to change.

So the question I had coming out of re:Invent was:

**Does Antiope’s inventory function just re-invent the AWS ConfigService wheel?**

I don’t think so. Here is why:

Cost

At scale, Antiope is still cheaper that AWS Config. Config costs about $3/mo per thousand items recorded. Antiope costs are based on Lambda usage:

If you had 250 accounts, and you ran it every thirty minutes, you’d invoke about 360,000 Lambda per month per resource-type recorded. Right now Antiope records 12 different resource-types. That’s 4.32m Lambda invokes. Everyone gets 1m free lambda invokes, so the next 3.5 million requests cost 70 cents. The average lambda duration is about 48sec and uses the minimum 128MB of RAM. Doing the math:

( (360,000 invokes per resource-type per month * 18 resource-types * 18sec avg duration * 0.128GB Ram used ) - 400,000 free GB-Seconds ) * $0.00001667 per GB-Second = $242 per month.

That’s $14 per month per resource-type or about $1 per AWS Account for the lambda executions.

There are other costs that begin to add up. To index & search all these resources we have a 5-wide m4.large.elasticsearch search cluster. For the approximately 100k resources we index every thirty minutes we end up paying quite a bit in S3-Put requests. The SQS and DynamoDB costs about another $200/mo. All told I estimate running Antiope to be about 0.1% of our total AWS monthly bill.

Based on observation, Config Service is costing us between .4% and 1.5% of the account’s AWS Spend (based on activity level of the AWS account).

Resource Support

For being a service that’s several years old, the number of AWS resources that AWS Config will monitor is a lot fewer than what you’d expect. Notably absent are:

  • Secrets Manager Secrets
  • Elastic Container Registry (ECR)
  • Elastic Container Service (ECS) Clusters & Tasks
  • ElasticSearch Domains
  • KMS Keys
  • Direct Connect
  • Route 53 Zones & Domains
  • EKS

In Config’s defense, it is a change management tool. With Antiope, I’m building a security vulnerability hunting tool and a compliance dashboard. I worry about anything that supports resource policies because those are the things that can be accidentally made public and show up on an UpGuard, MayhemDayOne or The Register report.

Config Strengths

Antiope’s inventory is a scheduled task. Config detects changes in real-time and saves those results. If you had a very dynamic environment, Config will catch resources that spin-up and shutdown in between Antiope runs.

Now that Config supports aggregating data into a single master account, it provides a reasonable ability to find resources across AWS accounts - a key concern when you adopt a multi-account strategy. But again, only for the resources Config Supports. And AWS doesn’t ship Config data across region, so if you need to find a needle in your cloud haystack, you still have to do that search in each region.

Config is a required service to support AWS’s new Security Hub. As such if you want to leverage Security Hub, you will need to enable Config.

So what’s next

Right now, Antiope won’t be leveraging AWS Config Service. I don’t want to turn it on across my whole environment. The cost impact will be high, and since it doesn’t cover all the resources I care about, it won’t eliminate the need for the Antiope Inventory stack. That said, if someone wants to run Antiope and use Config, I don’t want to make that extra hard. Plus the Config Service’s object format is well thought out.

So, I’ve re-factored how Antiope will store resource data as an object in S3. It now leverages the Config format with the Config Service’s dictionary elements.

An Antiope Resource’s required elements are:

resource_item = {}
resource_item['awsAccountId']                   = # 12 digit account ID (Same as config)
resource_item['awsAccountName']                 = # Name of the account as seen by the parent (Not part of Config)
resource_item['resourceType']                   = # Leverages the same resource types as Config & CloudFormation
resource_item['source']                         = "Antiope" # Only part of Antiope. Config doesn't populate this
resource_item['configuration']                  = # This is where config puts the json representation of the resource returned by the describe call
resource_item['configurationItemCaptureTime']   = str(datetime.datetime.now())
resource_item['resourceId']                     = # This is a unique identifier. It is also the object name

And the optional elements are:

resource_item['awsRegion']                      = # for regional services
resource_item['tags']                           = # as a python dict of { key: value }, and not the way returned by describe
resource_item['supplementaryConfiguration']     = {} # A dict of other elements related to the main resource, but not part of the original describe call.
resource_item['resourceName']                   = # an Antiope custom that is populated where appropriate
resource_item['ARN']                            = # Not all resources have ARNs, but if they do, it goes here.
resource_item['resourceCreationTime']           = # If provided by the describe call, this is populated
resource_item['errors']                         = {} # Any errors returned by boto when gathering data for supplementaryConfiguration

This should mirror closely to what Config does. ‘source’ is an addition which would allow for future disambiguation between resources created by Config and by Antiope. Additionally, while ‘supplementaryConfiguration’ is the Config name, how and what Antiope puts in there is quite different.

I’m reasonably happy with this layout. It’s not ideal for ElasticSearch due to how it handles arrays of objects, however I’ve conducted a few threat/vuln hunts with Antiope and I’m reasonably happy with the results. I think Antiope is pretty close to done for a first release. I just want to make sure I can add some of my company’s needed customizations before I can make an assurance I won’t change a bunch of stuff.