CloudGPT: When ChatGPT Audits Your AWS IAM Policies (And Why That's Fascinating)
Hook
What if instead of writing regex patterns and policy rules, you could just ask an AI "Is this AWS policy vulnerable?" CloudGPT does exactly that—and the results are both promising and concerning.
Context
AWS IAM policy analysis has traditionally been the domain of rule-based engines. Tools like Parliament and AWS Access Analyzer rely on predetermined rules, regular expressions, and algorithmic analysis to detect overly permissive policies, resource exposure, and privilege escalation paths. These tools are deterministic, predictable, and auditable—but they're also limited to the specific vulnerabilities their creators anticipated.
Enter the LLM era. CloudGPT represents an early experiment in a fundamentally different approach: what if we leveraged the pattern-recognition capabilities of large language models trained on vast amounts of security knowledge? Instead of encoding rules for every possible IAM misconfiguration, could we simply describe the problem space to an AI and let it reason about policy vulnerabilities? This tool doesn't replace traditional scanners—it couldn't and shouldn't—but it explores whether LLMs can surface the kinds of nuanced, context-dependent issues that slip through rule-based systems. It's a proof-of-concept that raises important questions about the future of security tooling in an AI-augmented world.
Technical Insight
CloudGPT's architecture is refreshingly straightforward: it's essentially a connector between AWS's boto3 SDK and OpenAI's API, with critical privacy protections in between. The tool starts by using boto3 to enumerate all customer-managed IAM policies in your AWS account, retrieves their JSON policy documents, and then performs a crucial redaction step before anything leaves your infrastructure.
The redaction mechanism is simple but important. CloudGPT generates a random 12-digit account number and replaces all instances of your actual AWS account ID in the policy JSON. This prevents your AWS account identifier from being sent to OpenAI's servers. Here's the core logic:
def redact_policy(policy_document, account_id):
fake_account = ''.join([str(random.randint(0, 9)) for _ in range(12)])
redacted = policy_document.replace(account_id, fake_account)
return redacted, fake_account
Once redacted, each policy is sent to ChatGPT with a carefully crafted prompt that asks the model to analyze the policy for security vulnerabilities. The prompt engineering here is critical—it needs to be specific enough to get actionable responses but general enough to catch diverse vulnerability types. The tool looks for explicit "Yes" or "No" responses in ChatGPT's output to determine if a vulnerability exists.
What makes this approach interesting is what it can theoretically catch that rule-based systems might miss. Consider a policy that grants s3:GetObject on arn:aws:s3:::company-backups-* but also grants s3:PutBucketPolicy on all resources. A rule-based system might flag the wildcard s3:PutBucketPolicy separately, but an LLM could potentially recognize the specific privilege escalation path: an attacker could modify bucket policies on the backup buckets to grant themselves additional access. This kind of multi-step, context-dependent reasoning is where LLMs theoretically excel.
The tool's output parsing is pragmatic. It doesn't try to extract structured data from ChatGPT's responses—instead, it presents the full conversational output alongside a simple vulnerable/not-vulnerable flag. This design acknowledges that the real value isn't in automated decision-making but in giving security engineers another perspective to consider:
if 'yes' in response.lower():
print(f"[!] Policy '{policy_name}' may be vulnerable")
print(f"Analysis: {response}")
else:
print(f"[✓] Policy '{policy_name}' appears secure")
The implementation runs serially through policies, making individual API calls to OpenAI for each one. There's no batching, caching, or result persistence—every run is fresh. This simplicity makes the code easy to understand and modify, but it also means scanning a large AWS environment could rack up significant OpenAI API costs and take considerable time.
One architectural choice worth noting: CloudGPT only examines customer-managed policies, not inline policies attached directly to users, groups, or roles. This is likely a pragmatic decision to keep scope manageable, but it means the tool misses a significant attack surface. Inline policies are often where the most dangerous permissions hide, precisely because they're less visible than managed policies.
Gotcha
The fundamental limitation is also the tool's core innovation: relying on an LLM for security decisions introduces non-determinism into your security posture. Run the same policy through CloudGPT twice and you might get different results. ChatGPT's responses depend on model version, temperature settings, prompt interpretation, and factors outside your control. A policy flagged as vulnerable today might pass tomorrow, not because anything changed, but because the LLM's response varied. For security tooling, where consistency and auditability matter enormously, this is deeply problematic.
The parsing strategy is also fragile. CloudGPT looks for "yes" or "no" in ChatGPT's freeform responses to determine vulnerability status. But LLMs don't always structure responses predictably—you might get "Yes, but only if...", "Not necessarily", or a nuanced explanation that doesn't contain simple boolean indicators. The tool will miss or misclassify these cases. Additionally, there's no validation of ChatGPT's reasoning. The model might hallucinate AWS API behaviors, misunderstand IAM evaluation logic, or miss vulnerabilities that a deterministic scanner would catch immediately. Without ground truth testing against known vulnerable and secure policies, you're flying blind. This isn't a knock on the tool's implementation—it's an inherent limitation of using LLMs for security-critical decisions in 2024.
Verdict
Use if: You're researching LLM applications in security tooling and want a concrete example of AI-assisted policy analysis, or you want a supplementary "second opinion" tool to run alongside established scanners and have budget for OpenAI API calls. It's particularly interesting if you're exploring prompt engineering for security contexts or building your own AI-augmented security tools. Skip if: You need reliable, production-grade IAM policy analysis. Use AWS IAM Access Analyzer for provable security properties, Parliament for deterministic linting, or Cloudsplaining for systematic least-privilege analysis instead. Don't use CloudGPT as your primary or only IAM scanner—the non-deterministic nature of LLM responses makes it unsuitable for security decisions without extensive human review. It's a fascinating experiment and conversation starter, but not a replacement for battle-tested tools built on algorithmic certainty.