VISTA: Building an AWS Security Scanner with LangGraph Intelligence

Hook

Most security scanners tell you what might be wrong. VISTA uses LangGraph workflows to validate whether vulnerabilities are actually exploitable—a crucial difference that separates noise from real threats.

Context

AWS security scanning has always suffered from a signal-to-noise problem. Traditional tools flag every public S3 bucket, every permissive security group, every IAM policy with wildcards—leaving security teams drowning in alerts where 90% turn out to be false positives or acceptable risks. You've seen this: a scanner screams about an "overly permissive" IAM role that's actually scoped correctly for your use case, or flags a security group as "vulnerable" when it's protected by three other layers of defense.

VISTA takes a different approach by integrating LangGraph—an orchestration framework for building stateful, multi-actor workflows—into the vulnerability assessment process. Instead of simply pattern-matching against security rules, it attempts to validate exploitability through intelligent analysis. Built entirely on AWS serverless infrastructure, it positions itself as a native solution for teams already invested in the AWS ecosystem who want security scanning without managing additional infrastructure.

Technical Insight

The architecture reveals thoughtful decisions about serverless constraints and security boundaries. At its core, VISTA separates concerns across three Lambda functions, each handling a specific domain: scanning execution, results retrieval, and history management. This separation isn't just organizational—it's a practical response to Lambda's execution limits.

The scanning Lambda represents the most sophisticated component. It instantiates a LangGraph workflow that orchestrates multiple steps: credential validation, resource enumeration, configuration analysis, and exploitability testing. Here's how a simplified scanning workflow might be structured:

from langgraph.graph import StateGraph, END
import boto3

class ScanState(dict):
    credentials: dict
    resources: list
    findings: list
    validated_vulnerabilities: list

def enumerate_resources(state: ScanState):
    session = boto3.Session(**state['credentials'])
    iam = session.client('iam')
    ec2 = session.client('ec2')
    
    state['resources'] = {
        'iam_policies': iam.list_policies(Scope='Local')['Policies'],
        'security_groups': ec2.describe_security_groups()['SecurityGroups']
    }
    return state

def analyze_configurations(state: ScanState):
    findings = []
    for sg in state['resources']['security_groups']:
        for rule in sg.get('IpPermissions', []):
            if any(ip.get('CidrIp') == '0.0.0.0/0' for ip in rule.get('IpRanges', [])):
                findings.append({
                    'type': 'OPEN_SECURITY_GROUP',
                    'resource': sg['GroupId'],
                    'severity': 'HIGH',
                    'details': rule
                })
    state['findings'] = findings
    return state

def validate_exploitability(state: ScanState):
    validated = []
    for finding in state['findings']:
        if finding['type'] == 'OPEN_SECURITY_GROUP':
            # Check if instances actually exist behind this SG
            # Check if the port is actually serving an application
            # Determine if the exposure is intentional (ALB, etc.)
            if actual_vulnerability_exists(finding):
                validated.append(finding)
    state['validated_vulnerabilities'] = validated
    return state

workflow = StateGraph(ScanState)
workflow.add_node('enumerate', enumerate_resources)
workflow.add_node('analyze', analyze_configurations)
workflow.add_node('validate', validate_exploitability)
workflow.set_entry_point('enumerate')
workflow.add_edge('enumerate', 'analyze')
workflow.add_edge('analyze', 'validate')
workflow.add_edge('validate', END)

app = workflow.compile()

This graph-based approach allows VISTA to maintain state across scanning stages and make intelligent routing decisions. If the enumeration phase discovers 500 security groups but only 10 have suspicious configurations, the validation phase focuses computational resources on those 10 rather than exhaustively testing everything.

The authentication boundary uses Cognito for frontend access control—a sensible choice that avoids building custom authentication. But the credentials flow reveals an architectural tension: users input AWS credentials through the web interface, which are passed to Lambda for scanning. The repository claims these aren't stored permanently, suggesting they live only in Lambda execution context. This ephemeral approach balances security with functionality, though it means users must re-authenticate for each scan.

DynamoDB serves as the persistence layer, storing scan results with a schema optimized for the access patterns: list scans by user (GSI on user_id), retrieve specific scan results (primary key on scan_id), and query findings by severity. The structure likely resembles:

{
    'scan_id': 'scan_20240115_abc123',
    'user_id': 'user@example.com',
    'timestamp': 1705334400,
    'status': 'COMPLETED',
    'findings': [
        {
            'id': 'finding_001',
            'resource_arn': 'arn:aws:iam::123456:policy/OverlyPermissive',
            'finding_type': 'IAM_WILDCARD',
            'severity': 'CRITICAL',
            'validated': True,
            'remediation': 'Apply least-privilege by replacing * with specific actions'
        }
    ],
    'stats': {
        'total_resources': 342,
        'findings_detected': 28,
        'exploitable_confirmed': 7
    }
}

The API Gateway integration exposes three REST endpoints with request/response transformations that shape data between frontend expectations and Lambda signatures. The /scan endpoint accepts credentials and configuration, returns a scan_id immediately, then processes asynchronously. The /results/{scan_id} endpoint polls for completion—a pattern forced by Lambda's timeout constraints.

The frontend visualization layer uses Chart.js to render security posture trends across scan history. This historical comparison becomes valuable for teams tracking remediation progress, showing whether the security posture is improving over time or accumulating technical debt.

Gotcha

Lambda's 15-minute maximum execution time creates the most significant operational friction. For AWS accounts with substantial resources—think hundreds of EC2 instances, thousands of IAM policies—scans won't complete within this window. VISTA handles this by writing partial results to DynamoDB and returning scan_id immediately, but it means you can't watch results populate in real-time. You submit a scan, navigate away, then check the 'Previous Scans' page 10 minutes later hoping it finished. For developers accustomed to tools like Prowler that stream findings as they discover them, this feels like a step backward.

The credential handling approach, while architecturally defensible, creates a trust boundary that enterprises will question. Typing AWS access keys into a web form—even one you deployed—triggers security team alarm bells. There's no AWS STS assume-role integration, no cross-account scanning setup documentation, no credential management beyond "we don't store them." For scanning production environments or customer accounts, this becomes a dealbreaker. The two-star GitHub presence and absent community also means you're largely on your own for troubleshooting, extending functionality, or validating that the LangGraph validation logic actually reduces false positives as advertised.

Verdict

Use VISTA if you're already deep in the AWS ecosystem, need a security scanner you can deploy entirely within your own AWS account without external dependencies, and value the experimental LangGraph validation approach enough to tolerate asynchronous-only scanning. It's particularly suitable for development or staging environments where credential sharing is less sensitive and you want to track security posture trends over time with minimal infrastructure overhead. Skip VISTA if you need real-time scan results, require enterprise-grade credential management with assume-role patterns, want comprehensive coverage beyond IAM and security groups, need a battle-tested tool with community validation, or are scanning production environments where credential handling must meet compliance requirements. For serious security assessments, Prowler or ScoutSuite offer more maturity, better credential management, and proven detection capabilities without the experimental overhead.

VISTA: Building an AWS Security Scanner with LangGraph Intelligence

VISTA: Building an AWS Security Scanner with LangGraph Intelligence

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

VISTA: Building an AWS Security Scanner with LangGraph Intelligence

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]