Simulating a Million AI Agents on Social Media: Inside OASIS's Architecture
Hook
What happens when you let a million AI agents loose on a simulated Twitter? OASIS answers this question—and the results reveal how misinformation spreads, echo chambers form, and viral content emerges through pure computational sociology.
Context
Traditional agent-based modeling has studied social phenomena for decades, but researchers faced a fundamental constraint: simulated agents were rule-based automatons following rigid if-then logic. They could model crowds, not communities. Enter large language models, and suddenly we can simulate agents with nuanced reasoning, memory, and emergent behavior. But previous LLM-agent frameworks like Stanford's Generative Agents maxed out at 25 agents in a small town simulation—nowhere near the scale needed to study platform-level social dynamics like information cascades or polarization.
This is the gap OASIS fills. Built by the CAMEL-AI team, it's a Python framework specifically designed for large-scale social media simulation with LLM-powered agents. Unlike general-purpose multi-agent systems, OASIS models the specific mechanics of platforms like Twitter and Reddit: follows, retweets, algorithmic recommendations, and the asymmetric information flow that defines modern social networks. The goal isn't to replace real social platforms for production use—it's to create a controlled laboratory where researchers can run repeatable experiments on questions like "How does a recommendation algorithm's bias affect polarization?" or "What's the tipping point for viral misinformation spread?"
Technical Insight
OASIS's architecture centers on three core components: an agent graph, an action system, and a recommendation engine. Each agent is a node with a profile (interests, posting style, political leaning) and maintains state in a SQLite database. The agent graph isn't just a social graph—it's a computational graph where edges represent follows, blocks, and mutes that determine information flow.
The framework's power lies in its hybrid control system. Agents can operate in two modes: manual scripting for controlled experiments, or autonomous LLM-driven decision-making for emergent behavior. Here's what autonomous agent setup looks like:
from oasis import TwitterEnvironment, Agent
from oasis.llm import OpenAIProvider
# Initialize environment and LLM backend
env = TwitterEnvironment(
db_path="simulation.db",
rec_sys="interest_based", # or "hot_score_based"
action_space_size=23
)
llm = OpenAIProvider(model="gpt-4o-mini")
# Create an agent with personality traits
agent = Agent(
agent_id="user_001",
profile={
"bio": "Climate researcher, coffee enthusiast",
"interests": ["climate_science", "sustainability"],
"political_leaning": 0.3, # -1 to 1 scale
"posting_frequency": "high"
},
llm_provider=llm,
autonomous=True
)
env.add_agent(agent)
# Run simulation timestep
for timestep in range(100):
# Each agent decides what to do based on their feed
actions = env.step(
activation_probability=0.1, # 10% of agents act per step
llm_temperature=0.7
)
env.record_metrics(timestep)
The action space includes 23 distinct behaviors: posting original content, replying, retweeting, liking, following, unfollowing, muting, blocking, searching by hashtag, and more. When an agent acts autonomously, the LLM receives the agent's profile, recent feed items, and available actions, then generates a decision. OASIS constrains the LLM output to valid action schemas using structured generation, preventing hallucinated actions.
What makes this scalable to a million agents is the activation probability mechanism. Not all agents act simultaneously—just as real users don't all post at once. Each timestep, a configurable percentage of agents are activated based on probability distributions. This dramatically reduces LLM calls while maintaining statistical validity:
# For 100k agents with 1% activation
# Only ~1,000 LLM calls per timestep instead of 100,000
env.step(activation_probability=0.01)
# Agents can have different activation probabilities
agent.set_activation_weight(2.0) # 2x more likely to be active
The recommendation system is crucial for realistic simulation. OASIS implements two algorithms: interest-based matching (content is shown to users with similar interest vectors) and hot-score-based ranking (mimicking Reddit's algorithm that considers upvotes, time decay, and controversy). This creates the feedback loops essential to social media dynamics—popular content gets more exposure, leading to more engagement, creating power-law distributions of virality.
For data collection, OASIS logs every action to the SQLite backend with full provenance: who did what, when, why (the LLM's reasoning), and the resulting state changes. Researchers can then query this data to analyze network formation patterns, measure information spread velocity, or identify emergent communities. The database schema separates user state, content objects (posts, comments), and relationship edges, making complex social network analysis queries straightforward.
Gotcha
The elephant in the room is cost. OASIS's claim of supporting "one million agents" is technically accurate but economically absurd for most use cases. Consider the token math: with 100 agents at 10% activation probability and GPT-4o-mini ($0.15 per million input tokens, $0.60 per million output tokens), a single timestep with agents examining 10 feed items each consumes roughly 200K input tokens and 20K output tokens—about $0.04 per timestep. Run a modest 1,000-timestep simulation and you're at $40. Scale to 10,000 agents and costs explode to hundreds or thousands of dollars for a single experiment. The million-agent scenario would require either massive budgets or extremely sparse activation rates that undermine simulation fidelity.
Rate limits compound the problem. OpenAI's API rate limits mean that even if you can afford it, you'll hit throughput ceilings. OASIS includes async execution and batching, but you're still constrained by your LLM provider's infrastructure. In practice, most meaningful research will happen in the 100-1,000 agent range, which is still valuable but far from the headline number. Additionally, the platform support is limited—only Twitter and Reddit are implemented, and the mechanics are simplified compared to real platform complexity. There's no ad ecosystem, no algorithmic timeline personalization beyond basic recommendations, and no consideration of bots, coordinated manipulation, or platform moderation policies that shape real social media.
Verdict
Use OASIS if: you're conducting academic research on social dynamics and need controlled, repeatable experiments at scales traditional ABM tools can't provide with comparable behavioral realism; you're studying specific phenomena like misinformation cascades, filter bubble formation, or content moderation policy impacts where LLM-based agent reasoning is crucial; you have budget for LLM API costs (hundreds to thousands of dollars per major experiment) and can justify it through research grants or institutional support; or you need to prototype social features and want to simulate user behavior before building production systems. Skip if: you need real-time or production-grade social platforms (this is a research tool, not a product); your budget is constrained and you can't afford substantial LLM API costs for exploratory work; you require platforms beyond Twitter/Reddit or need high-fidelity modeling of specific platform features; or simple rule-based agents would suffice for your research questions (traditional ABM tools like Mesa or NetLogo will be orders of magnitude cheaper and faster). OASIS occupies a sweet spot for computational social science researchers who need the emergent complexity that LLMs provide but don't need every detail of production social platforms.