How Netflix Automatically Strips AWS Permissions at Scale with Repokid
Hook
Netflix manages over 1,000 AWS accounts where developers ship code dozens of times per day. Every new service starts with overly-permissive IAM roles because nobody has time to write minimal policies during a deployment sprint. This is how they fix it automatically.
Context
The IAM permission problem in high-velocity organizations is simple to describe but brutal to solve: developers need to move fast, so they request broad permissions. Security teams approve them because blocking deployments creates friction. Six months later, you have thousands of roles with access to services they've never touched, and nobody remembers which permissions are actually needed.
Manual IAM audits don't scale. A typical role might have dozens of services with hundreds of actions. Determining which permissions are genuinely required means interviewing teams, reading code, understanding infrastructure dependencies, and risking production outages if you guess wrong. AWS IAM Access Advisor provides data about which services a role has used, but translating that into safe policy changes across hundreds of accounts is still a human-intensive process. Netflix built Repokid to automate this entire pipeline: collect usage data, calculate removable permissions, apply changes, and maintain an audit trail—all without human intervention once configured.
Technical Insight
Repokid's architecture splits the least-privilege problem into two distinct tools. Aardvark runs on a schedule to collect IAM Access Advisor data from every account and stores it in a queryable format. Repokid consumes this data to make permission removal decisions. This separation allows Aardvark to scale horizontally across accounts while Repokid handles the more complex logic of policy calculation and updates.
The core workflow starts with role discovery. Repokid assumes a cross-account IAM role in each managed account, enumerates all roles, and stores metadata in DynamoDB. For each role, it queries Aardvark for service-level usage data from Access Advisor. If a role has permissions for 15 AWS services but Access Advisor shows activity in only 8, those 7 unused services become candidates for removal—what Repokid calls 'repoable' permissions.
Before making changes, every role passes through a configurable filter chain. This is where Repokid becomes production-ready instead of dangerously naive. Filters can exclude roles based on age (don't touch roles created in the last 90 days), blocklists (never modify this specific role), permission boundaries, or custom business logic. Here's what a basic filter configuration looks like:
# Example filter configuration
filter_config = {
"AgeFilter": {
"minimum_age": 90 # days
},
"BlocklistFilter": {
"blocklist": [
"ProductionDatabaseAdmin",
"IncidentResponseRole"
]
},
"ActiveFilter": {}, # Exclude roles used in last 90 days
"ExclusiveFilter": {
"exclusive_services": [
"iam", # Never auto-remove IAM permissions
"organizations" # Or Organizations access
]
}
}
The ExclusiveFilter is particularly important. Some AWS services are dangerous to remove automatically—IAM, KMS, security services—because infrequent legitimate use might not appear in Access Advisor data. A role that rotates KMS keys quarterly could have KMS permissions removed before the next rotation.
Once a role passes all filters, Repokid calculates the new inline policy. It doesn't modify AWS managed policies or customer managed policies; it only works with inline policies attached directly to roles. The tool fetches the current inline policy, removes statements for unused services, and generates a replacement policy. Before applying changes, Repokid stores the old policy version in DynamoDB, creating an audit trail and rollback capability.
The update process uses cross-account role assumption with carefully scoped trust policies. In each managed account, you create a RepokidRole that trusts the central Repokid account and has permissions to update IAM inline policies:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:GetRole",
"iam:GetRolePolicy",
"iam:PutRolePolicy",
"iam:DeleteRolePolicy",
"iam:ListRolePolicies",
"iam:ListAttachedRolePolicies"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"iam:ListRoles"
],
"Resource": "*"
}
]
}
Repokid's hook system provides extension points for custom logic. You can write hooks that fire before or after repo operations, enabling integration with ticketing systems, Slack notifications, or additional validation layers. A post-repo hook might automatically create a JIRA ticket documenting which permissions were removed from which roles, giving teams visibility into changes even though the process is automated.
The DynamoDB schema maintains state across runs. Each role gets a record tracking its repo history: how many times permissions have been removed, when the last repo occurred, which services were removed, and whether any rollbacks happened. This data feeds into decisions about whether a role is stable enough for automated changes or needs human review.
Gotcha
Repokid's reliance on AWS IAM Access Advisor data is both its greatest strength and biggest limitation. Access Advisor tracks service-level usage, not action-level granularity. If a role uses S3, Access Advisor confirms S3 activity but doesn't distinguish between s3:GetObject and s3:DeleteBucket. Repokid removes entire services, not individual actions, so it can't optimize roles that legitimately need a service but have overly broad actions within that service. A role with s3:* that only reads objects will keep s3:* because Access Advisor shows S3 usage.
The infrastructure requirements are substantial. You need DynamoDB tables with global secondary indexes, a separate Aardvark deployment pulling Access Advisor data continuously, and cross-account IAM roles in every managed account. Access Advisor data has known delays—sometimes 4+ hours—and doesn't capture CloudTrail-level detail about specific API calls. For critical roles with infrequent but essential operations (disaster recovery roles, quarterly compliance jobs), Access Advisor might show a service as unused simply because it hasn't been needed during the observation window. The 90-day default lookback helps but doesn't eliminate this risk. You'll need comprehensive filters and possibly manual exemptions for edge cases. Additionally, since Repokid only manages inline policies, organizations that primarily use AWS managed policies or customer managed policies won't benefit without restructuring their IAM strategy.
Verdict
Use if: You manage 50+ AWS accounts with rapid deployment velocity where developers routinely create over-permissioned roles, you already have infrastructure-as-code maturity and can deploy supporting services like DynamoDB and Aardvark, and you're willing to invest engineering time in tuning filters and monitoring rollbacks during a multi-month rollout. Repokid shines in environments where the cost of manual IAM audits exceeds the cost of building automation. Skip if: You have fewer than 20 accounts where manual quarterly reviews are feasible, you primarily use AWS managed policies that Repokid can't modify, your roles perform infrequent critical operations that Access Advisor won't capture, or you lack the operational maturity to safely run automated permission changes in production. Smaller organizations should start with AWS IAM Access Analyzer's unused access findings and manual remediation before investing in Repokid's complexity.