MiroFish: Simulating Thousands of AI Agents to Predict Social Dynamics Before They Happen
Hook
What if you could test a controversial policy decision on thousands of digital citizens before implementing it in the real world? MiroFish makes this possible by creating high-fidelity simulations where AI agents with independent memories and personalities interact to reveal collective behaviors that statistical models can’t predict.
Context
Traditional forecasting excels at linear problems—predicting stock prices from historical trends, estimating product demand from seasonal patterns. But it crumbles when faced with complex adaptive systems where individual behaviors compound into unpredictable collective outcomes. How will public opinion shift after a university scandal? What happens when a new financial regulation interacts with investor psychology? How might a novel’s characters behave if the author had written 40 more chapters?
MiroFish, which has received strategic support and incubation from China’s Shanda Group, tackles this prediction gap through multi-agent simulation at scale. Rather than modeling populations as statistical aggregates, it instantiates thousands of individual agents—each with distinct personalities, long-term memories (via services like Zep Cloud), and behavioral logic powered by LLMs. Feed it seed data (a news article, policy draft, or the first 80 chapters of Dream of the Red Chamber), describe what you want to predict in natural language, and the system constructs a parallel digital world where these agents interact freely. The resulting emergent behaviors—social movements, market panics, narrative arcs—surface insights that traditional models miss entirely. With 47,600+ GitHub stars and backing from a major technology group, this represents a serious attempt to make ‘what-if’ scenario planning accessible beyond academic research labs.
Technical Insight
MiroFish orchestrates simulation through a five-stage pipeline that balances automation with control. Stage one converts your seed data into a knowledge graph using GraphRAG, extracting entities, relationships, and contextual information that ground agents in structured reality. This isn’t just keyword extraction—it builds a semantic foundation that agents query throughout the simulation to maintain factual consistency.
Stage two generates the agent population. The system performs entity extraction to identify key actors, then uses LLMs to synthesize detailed personality profiles for each agent. Here’s where it gets interesting: these aren’t shallow chatbot personas. Each agent receives both individual memory (personal experiences, relationships) and collective memory (shared cultural context, group knowledge) injected from the knowledge graph. The environment configuration step then defines interaction rules, spatial constraints, and temporal dynamics that govern how agents can influence each other.
The simulation engine itself runs on CAMEL-AI’s OASIS framework, which handles parallel agent execution at scale. During stage three, agents operate autonomously—forming opinions, building relationships, reacting to events. MiroFish automatically parses your prediction requirements and translates them into simulation parameters: how many rounds to run, which variables to track, what intervention points to enable. As the simulation progresses, the configured memory backend (such as Zep Cloud in the default setup) maintains temporal memory for each agent, allowing them to reference past interactions and evolve believably over simulated time. The system supports dynamic intervention mid-simulation—inject a breaking news event, remove a key agent, modify environmental variables—giving you ‘God’s eye view’ control to test different scenarios.
Here’s a simplified example of how you might configure a simulation environment (based on the project’s architecture):
# Conceptual example based on MiroFish's workflow
from mirofish import SimulationEngine, AgentConfig
# Stage 1-2: Knowledge graph already built from seed data
engine = SimulationEngine(
llm_config={
'api_key': 'your_qwen_plus_key',
'base_url': 'https://dashscope.aliyuncs.com/compatible-mode/v1',
'model': 'qwen-plus'
},
memory_backend='zep_cloud',
zep_api_key='your_zep_key'
)
# Stage 3: Define prediction goal in natural language
simulation = engine.create_simulation(
seed_data='university_scandal_report.pdf',
prediction_query='How will public opinion evolve over 2 weeks if the university issues an apology versus staying silent?',
agent_count=5000,
simulation_rounds=30 # Keep under 40 for cost management per documentation
)
# Run base scenario
results_apology = simulation.run(intervention={'day_3': 'official_apology'})
# Run counterfactual
results_silence = simulation.run(intervention=None)
# Stage 4: Generate comparative analysis
report = simulation.generate_report(
scenarios=[results_apology, results_silence],
focus_areas=['sentiment_trends', 'influential_agents', 'narrative_clusters']
)
Stage four hands off to a specialized ReportAgent equipped with tools to query the post-simulation environment. Unlike a simple summary, this agent can interrogate individual agents about their decision-making, trace influence cascades through the social network, and identify critical junctures where small changes would have produced different outcomes. The reports aren’t just charts—they include narrative explanations of why certain patterns emerged.
Stage five opens the digital world for exploration. The Node.js frontend (served on port 3000) provides interfaces to chat with any individual agent, examine their memory logs, or continue dialogues with the ReportAgent to drill into specific findings. This micro-level interaction is crucial—you can verify that agents behave believably by checking whether their stated motivations align with their simulated actions.
The architecture’s split between Python backend (Flask API on port 5001) and Node.js frontend follows standard full-stack patterns, but the memory management deserves attention. In the default configuration, Zep Cloud handles the heavy lifting of maintaining coherent long-term memory for potentially thousands of agents across extended simulations. This is non-trivial: each agent needs to recall relevant past interactions without context windows exploding, and Zep’s hybrid search (vector + keyword + temporal) makes this tractable. The reliance on external memory infrastructure in the documented setup is both a strength (you don’t manage scaling) and a consideration (ongoing service dependencies, potential costs beyond LLM API bills).
Gotcha
The documentation is refreshingly honest about computational costs, warning users to start with fewer than 40 simulation rounds while testing. This isn’t a casual tool—running thousands of agents through dozens of interaction cycles with LLM calls for each decision point can generate substantial API bills. The repository’s recommended configuration uses Alibaba’s Qwen-Plus model via DashScope, and even with cost-optimized LLMs, a serious simulation could easily consume significant API credits depending on scale and complexity.
The default setup relies on paid external services beyond just compute. Zep Cloud is configured for agent memory in the documented examples—while they offer a free tier that the README states ‘can support simple use,’ scaling up to thousands of agents with rich interaction histories may require paid plans. Users should evaluate whether the configured memory backend meets their scaling needs and budget constraints.
The bilingual documentation (Chinese-first with English translations) is generally clear, but examples skew heavily toward Chinese contexts—the Wuhan University (武汉大学) public opinion case, Dream of the Red Chamber literary analysis, domestic policy scenarios. International users won’t find pre-built templates for Western contexts, meaning you’ll need to create seed data and validation criteria from scratch. The Docker setup helps with deployment, but the .env.example file doesn’t include detailed guidance on sizing LLM context windows or estimating token consumption per simulation round—users should plan to monitor and budget for API costs during initial testing.
The project is relatively new to widespread attention (based on recent star growth), so community resources, third-party tutorials, and troubleshooting discussions may be limited compared to more established frameworks. The dependency on the OASIS framework means understanding its capabilities and limitations is important for advanced customization.
Verdict
Use MiroFish if you’re making high-stakes decisions where simulating emergent social dynamics could provide valuable insights before real-world implementation—think policy analysts testing regulatory impacts, communications teams gaming scenarios, or researchers studying collective behavior patterns. It’s particularly interesting when you need to compare counterfactual scenarios (‘what if we had responded differently?’) or when stakeholder interactions create non-linear outcomes that spreadsheet models can’t capture. The support from Shanda Group and integration with established infrastructure (OASIS framework, enterprise-grade LLMs, scalable memory services) suggests this is designed for serious workloads beyond simple demos.
Skip it if your forecasting needs are addressable with statistical methods (demand forecasting, trend extrapolation), if you’re operating on limited budgets without room for API costs that scale with simulation complexity, or if you need instant results—simulation setup, knowledge graph construction, and multi-round agent interactions take time. Also consider alternatives if you prefer complete control over dependencies; the documented configuration uses specific external services (Zep Cloud for memory, compatible LLM APIs) that create ongoing operational considerations. For exploring complex adaptive systems where collective intelligence and emergent behaviors matter, it’s a sophisticated open-source tool worth evaluating. For simpler prediction needs, traditional statistical approaches may be more efficient.