Building a Slack Doppelgänger: Fine-Tuning LLMs on Your Message History with Modal
Hook
What if your colleagues couldn't tell whether you or an AI wrote that Slack message? DoppelBot makes this uncomfortably possible by fine-tuning language models on your entire message history.
Context
The explosion of LLM capabilities has created a new problem: most developers know how to call GPT-4's API, but few understand how to personalize models with their own data. Fine-tuning remains mysterious, buried behind research papers and enterprise ML platforms. Meanwhile, Slack workspaces contain goldmines of conversational data—thousands of messages capturing individual writing styles, domain knowledge, and response patterns.
DoppelBot bridges this gap by demonstrating a complete, production-ready pipeline for personalizing language models. Built by Jerry Liu (creator of LlamaIndex), it showcases how to combine Slack's API, serverless infrastructure, and LLM fine-tuning into a single coherent application. Rather than abstracting away complexity, it exposes the full workflow: data collection, model training, and inference deployment. This makes it valuable not just as a quirky Slack bot, but as a reference architecture for anyone building personalized AI applications.
Technical Insight
DoppelBot's architecture reveals how to orchestrate complex ML workflows without managing a single server. The entire system runs on Modal, a serverless platform designed specifically for compute-intensive workloads like model training.
The data collection phase uses Slack's Web API to scrape a target user's message history. This isn't just dumping raw messages—it constructs conversational pairs by matching messages with their context. When user Alice responds to Bob's question, DoppelBot captures both the question and Alice's answer as a training example. This pairing is crucial: language models learn from input-output relationships, not isolated text fragments. The scraper respects Slack's rate limits and handles pagination, accumulating hundreds or thousands of conversational turns depending on how active the user is.
The fine-tuning stage demonstrates Modal's strength. Here's a simplified version of how DoppelBot configures the training job:
import modal
stub = modal.Stub("doppel-bot")
image = modal.Image.debian_slim().pip_install(
"torch",
"transformers",
"peft", # Parameter-Efficient Fine-Tuning
"bitsandbytes", # For quantization
)
@stub.function(
gpu="A10G",
timeout=3600, # 1 hour
image=image,
secret=modal.Secret.from_name("slack-oauth"),
)
def fine_tune_model(user_id: str, messages: list):
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
# Load base OpenLLaMa model
model = AutoModelForCausalLM.from_pretrained(
"openlm-research/open_llama_3b",
load_in_8bit=True, # Quantize to fit in GPU memory
)
tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_3b")
# Configure LoRA for efficient fine-tuning
lora_config = LoraConfig(
r=16, # Low-rank dimension
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
# Format messages as prompt-completion pairs
training_data = format_conversations(messages, user_id)
# Train and save adapter weights
trainer.train()
model.save_pretrained(f"models/{user_id}")
This code reveals several architectural decisions. First, DoppelBot uses OpenLLaMa rather than proprietary models, giving complete control over the fine-tuning process. Second, it employs LoRA (Low-Rank Adaptation), a parameter-efficient technique that trains only a small set of adapter weights instead of the entire model. This reduces training from days to roughly an hour and cuts GPU memory requirements dramatically. Third, 8-bit quantization further optimizes memory usage, making it possible to fine-tune a 3-billion-parameter model on a single A10G GPU.
The multi-tenancy implementation shows production engineering maturity. When distributed across workspaces, DoppelBot can't hardcode credentials. Instead, it implements Slack's OAuth flow, redirecting workspace admins through Slack's authorization screens and storing the resulting tokens in Neon's serverless Postgres:
@stub.function()
@modal.web_endpoint(method="GET")
def oauth_callback(code: str, state: str):
# Exchange code for access token
response = requests.post(
"https://slack.com/api/oauth.v2.access",
data={
"client_id": os.environ["SLACK_CLIENT_ID"],
"client_secret": os.environ["SLACK_CLIENT_SECRET"],
"code": code,
},
)
# Store in Neon Postgres
conn = psycopg2.connect(os.environ["DATABASE_URL"])
cursor = conn.cursor()
cursor.execute(
"INSERT INTO installations (team_id, bot_token, user_id) VALUES (%s, %s, %s)",
(response["team"]["id"], response["access_token"], state),
)
conn.commit()
This separation of credentials per workspace is essential for distribution. Each Slack workspace gets isolated storage, preventing token leakage across organizations.
The inference endpoint ties everything together. When mentioned in Slack, DoppelBot retrieves the fine-tuned LoRA weights, loads them onto the base model, and generates a response:
@stub.function(gpu="T4", keep_warm=1)
@modal.web_endpoint(method="POST")
def respond_to_mention(event: dict):
user_id = get_target_user(event["team_id"])
context = get_thread_context(event["channel"], event["ts"])
# Load fine-tuned adapter
model = load_base_model()
model.load_adapter(f"models/{user_id}")
# Generate response
prompt = format_prompt(context)
response = model.generate(prompt, max_length=150)
# Post to Slack
post_message(event["channel"], response, thread_ts=event["ts"])
The keep_warm=1 parameter tells Modal to maintain one hot instance, reducing cold-start latency. This matters for Slack bots, where users expect sub-second responses. The tradeoff is cost—keeping a GPU warm isn't free—but for a production bot, responsiveness justifies the expense.
Gotcha
DoppelBot's biggest limitation is its one-user-per-workspace constraint. Once you've chosen a target user and completed fine-tuning, there's no built-in mechanism to switch targets or retrain. If you want to impersonate a different person, you're manually editing configuration files and re-running the entire pipeline. For a tool called 'DoppelBot,' the inability to easily manage multiple doppelgängers feels like a significant oversight.
Training time presents another challenge. The one-hour fine-tuning window per user means experimentation is expensive. Want to test whether including emoji improves personality matching? That's another hour. Wondering if filtering out short messages helps? Another hour. This slow iteration cycle makes it difficult to optimize results, especially when output quality varies wildly based on input data. Users with sparse Slack histories (new employees, occasional contributors) often produce generic, unsatisfying impersonations because the model hasn't seen enough examples to learn meaningful patterns. The repository provides no guidance on minimum message thresholds or data quality checks, leaving developers to discover these requirements through trial and error.
Verdict
Use DoppelBot if you're learning how to build end-to-end ML applications and want a concrete example of serverless fine-tuning workflows. It's exceptionally valuable as a reference implementation, demonstrating OAuth handling, GPU orchestration, and model deployment patterns you can adapt to other personalization use cases. It's also perfect if you're specifically exploring how conversational data can fine-tune language models and want working code rather than theoretical tutorials. Skip it if you need production-ready user impersonation with quality guarantees—the output is hit-or-miss depending on data volume. Also skip if you want to manage multiple users per workspace, need quick iteration cycles for experimentation, or require real-time model updates as users send new messages. The six-star count accurately reflects its status: this is an educational prototype showcasing technical possibilities, not a polished product ready for widespread deployment.