Back to Articles

GhostLine: Building a Voice-Cloning Vishing Framework with FastAPI, Twilio, and LLMs

[ View on GitHub ]

GhostLine: Building a Voice-Cloning Vishing Framework with FastAPI, Twilio, and LLMs

Hook

What if a phone-based social engineering attack could handle conversations autonomously while an operator monitors from the sidelines? GhostLine makes vishing as repeatable as an email phishing kit.

Context

Traditional vishing (voice phishing) attacks require skilled human operators who can think on their feet, adapt to unexpected responses, and maintain consistent personas across dozens of calls. This creates a scalability problem for red teams conducting authorized security assessments: you need talented social engineers, each call requires full human attention, and results vary based on operator skill and fatigue.

GhostLine addresses this by treating phone-based social engineering as an orchestration problem. It chains together Twilio for telephony, Deepgram for real-time speech-to-text, OpenAI’s GPT models for conversational intelligence, and ElevenLabs for voice cloning and text-to-speech. The result is a system that can initiate calls, hold naturalistic conversations, adapt tactics based on victim responses, and capture credentials with minimal human intervention. The framework runs on a local FastAPI server with an ngrok tunnel. As the README states: “Feed it a number. Your cloned voice does the social engineering, while you sip your coffee.” This is an offensive security tool designed for authorized penetration testing engagements where you need to test human vulnerability at scale.

Technical Insight

GhostLine’s architecture centers on a FastAPI server exposing two critical endpoints—/voice for handling Twilio Voice Webhooks and /twilio for WebSocket media streams. When you initiate a call with python main.py call +15551234567 --persona calm --campaign demo, the CLI hits Twilio’s REST API to start the call, Twilio connects to your ngrok-tunneled /voice endpoint, and the server responds with TwiML that establishes a bidirectional WebSocket connection for streaming audio.

The core of the system runs three concurrent coroutines per call: one consumes incoming audio from the WebSocket and pipes µ-law frames (8kHz PSTN standard) to Deepgram’s streaming STT endpoint, another monitors for silence periods to detect when the victim has stopped speaking, and a third manages the LLM conversation loop and queues synthesized audio for transmission back through Twilio. This streaming pipeline processes audio continuously—Deepgram emits partial transcripts as words arrive, OpenAI’s chat completion streams tokens, and ElevenLabs generates audio incrementally.

What makes GhostLine particularly interesting is the 12-stage persuasion engine. Instead of hardcoding conversation logic, the tool ships with YAML playbooks that specify tactical stages: rapport building, credibility establishment, objection handling, urgency creation, information gathering, credential capture, confirmation, and exit strategies. Each stage includes a system prompt that dynamically adjusts the LLM’s tone and objectives. The README includes a state diagram showing transitions:

RAPPORT → CREDIBILITY → URGENCY → INFORMATION_GATHERING → 
CREDENTIAL_CAPTURE → CONFIRMATION → SUCCESS/FAILURE

# With objection handling as a fallback state
CREDIBILITY → OBJECTION_HANDLING → URGENCY (if resolved)
CREDIBILITY → OBJECTION_HANDLING → FAILURE (if unresolved)

The LLM receives the current stage’s system prompt along with conversation history, then generates contextually appropriate responses. If the victim challenges credibility, the system transitions to OBJECTION_HANDLING with prompts designed to deflect suspicion. If urgency is accepted, it moves to INFORMATION_GATHERING to extract target data. This state machine approach decouples persuasion tactics from implementation details—you can swap entire attack flows by changing the YAML playbook.

Every conversation generates forensic-grade evidence. The SQLite schema stores SHA-256 hashes of transcripts, audio frames, and state transitions. As the README states: “Evidence-grade logging—every frame & transcript SHA-256’d into SQLite.” This design reflects the tool’s intended use case: formal security assessments where you need defensible documentation of what the AI said, when it said it, and what data was captured.

Voice cloning happens once via python main.py clone assets/it_sample.wav --name helpdesk, which uploads a sample to ElevenLabs and returns a voice ID. That ID gets passed to the server at startup with --voice-id helpdesk, and every subsequent TTS request uses that cloned voice. Combined with an LLM that generates contextually appropriate dialogue, the system can produce convincing impersonations.

The network architecture deliberately segments trust zones. The operator’s local machine runs the FastAPI server and SQLite database, the ngrok tunnel handles TLS termination, Twilio’s edge infrastructure manages PSTN connectivity, and the SaaS AI providers process the intelligence layers. From the victim’s perspective, the call originates from Twilio’s PSTN gateway—it appears as standard business telephony with proper caller ID if configured. The README notes: “C2 in Plain Sight—Media traffic is indistinguishable from normal Twilio Voice Streams (µ-law @ 8 kHz). IDS rules that alert on weird HTTPS hosts ignore it.”

Gotcha

GhostLine depends entirely on commercial SaaS providers. If Twilio rate-limits your account, Deepgram’s websocket drops, OpenAI returns a 429, or ElevenLabs suffers an outage, your operation stops. There’s no offline mode or local model fallback. You’re running a telephony operation through four third-party APIs, each with its own reliability profile, rate limits, and terms of service.

The legal and ethical constraints are severe. Unauthorized use violates wiretapping laws, fraud statutes, and telecom regulations across most jurisdictions. Even in authorized penetration tests, you need explicit written rules of engagement that specifically permit vishing attacks, voice recording, and credential capture. The README states unequivocally: “No signed Rules of Engagement? No dialing. GhostLine is intended strictly for authorized security assessments only.” This isn’t a tool for casual experimentation—it’s a legally high-risk capability that requires mature operational security and legal review.

While the playbook system provides structure, LLM behavior introduces some unpredictability in actual conversation flow. You can guide behavior with system prompts and playbooks, but conversational AI may occasionally generate unexpected responses. The README emphasizes this is designed for “authorized security assessments only” and requires institutional backing to use responsibly.

Verdict

Use GhostLine if you’re conducting authorized red team engagements where you need to demonstrate social engineering vulnerabilities, test employee security awareness training effectiveness, or provide evidence of vishing defense gaps. You need explicit written authorization, institutional legal backing, and a requirement to document every aspect of the engagement with forensic precision. The 12-stage playbook system enables structured, repeatable attacks with consistent personas. Skip it if you lack formal authorization with signed rules of engagement, need offline capability or independence from third-party dependencies, or operate in jurisdictions with restrictive telecom regulations. As the README emphasizes: “Be good, and hack on ethically!” This is a high-impact offensive security tool that demands mature operational discipline—not a casual pentesting utility. If you can’t show the legal paperwork that explicitly permits voice-based social engineering for security testing purposes, you shouldn’t be running GhostLine.

// QUOTABLE

What if a phone-based social engineering attack could handle conversations autonomously while an operator monitors from the sidelines? GhostLine makes vishing as repeatable as an email phishing kit.

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/mbhatt1-ghostline.svg)](https://starlog.is/api/badge-click/developer-tools/mbhatt1-ghostline)