Kiroku: When Your AI Becomes the PhD Student and You Become the Advisor
Hook
What if instead of prompting an AI to help you write, you reviewed drafts from an AI 'student' who does the actual writing? Kiroku inverts the entire human-AI collaboration model for academic document creation.
Context
Most AI writing tools position themselves as assistants: you write, they suggest improvements, rephrase sentences, or fill in gaps. This paradigm mirrors the traditional writing process with a helpful sidekick. But anyone who's struggled through writing a research paper knows the real bottleneck isn't polishing prose—it's generating coherent structure from scattered research, maintaining consistent argumentation across sections, and iterating through multiple drafts.
Kiroku attacks this differently. Built on LangGraph, it implements a multi-agent system inspired by PhD advisor-student dynamics. You don't write the first draft; the AI agents do. You provide high-level guidance—hypothesis, structure, domain instructions—and then review what the agents produce, offering feedback that triggers revision cycles. The system orchestrates multiple specialized agents through a pipeline: web search for sources, topic sentence generation, paragraph expansion, section assembly, and reflection-based revision. It's designed specifically for formal academic writing where structure and citations matter more than creative voice.
Technical Insight
Kiroku's architecture centers on a LangGraph workflow that coordinates agents through distinct phases. The process begins with YAML configuration files that specify document structure, hypothesis, and generation parameters. Here's what a minimal configuration looks like:
title: "Multi-Agent Systems for Document Generation"
hypothesis: "Collaborative AI agents can produce more coherent academic writing than single-model approaches"
sections:
- name: "Introduction"
instructions: "Define multi-agent systems and their application to NLP tasks"
min_paragraphs: 2
max_paragraphs: 3
- name: "Methodology"
instructions: "Describe the agent coordination mechanisms"
min_paragraphs: 3
max_paragraphs: 5
temperature: 0.7
revision_count: 2
The system executes a structured pipeline. First, a research agent uses Tavily's search API to gather relevant sources based on your hypothesis and section instructions. This grounds the generation in actual references rather than pure hallucination. Next, a planning agent generates topic sentences for each section—these act as scaffolding for the document's argumentative structure. A writer agent then expands each topic sentence into full paragraphs, maintaining coherence with the hypothesis and section-level instructions.
What makes Kiroku genuinely multi-agent is the reflection layer. After generating a complete draft, a critic agent evaluates the output against criteria like logical flow, citation quality, and alignment with instructions. This reflection produces specific feedback—not just 'improve clarity' but actionable observations like 'paragraph 3 doesn't connect to the hypothesis' or 'missing transition between sections 2 and 3'. The writer agent then revises based on this critique, creating an iterative refinement loop.
The LangGraph orchestration handles state management across these agents. Each agent operates on a shared document state that includes the current draft, gathered sources, revision history, and critique feedback. This shared context prevents the common multi-agent problem of agents working at cross-purposes or losing track of previous decisions.
The Gradio interface provides the human-in-the-loop mechanism. After each major phase (initial draft, post-revision draft), you can provide natural language feedback: 'focus more on practical applications' or 'cite more recent sources from 2023-2024'. This feedback gets incorporated into the next generation cycle, functioning like an advisor's comments on a student draft.
The configuration-driven approach means you can fine-tune agent behavior without touching code. Temperature controls creativity versus consistency. Revision count determines how many critique-and-revise cycles run automatically. Section-level instructions let you specify different writing styles or technical depth for different parts of the document. This flexibility is crucial for academic writing where introduction sections demand different treatment than methodology or results sections.
Gotcha
Kiroku's external dependencies create immediate friction. You need both OpenAI and Tavily API keys, meaning ongoing costs for every document generation. There's no support for local models, open-source alternatives, or even other commercial providers like Anthropic or Cohere. For academics on tight budgets or researchers in organizations with data privacy requirements, this is a non-starter.
The documentation reveals rough edges typical of early-stage projects. Python version support explicitly caps at 3.11, and the Gradio interface has image handling quirks requiring specific file path syntax ('/file=images/...'). More critically, while the system claims to be multi-agent, the actual agent implementations, prompt engineering strategies, and coordination logic remain black boxes. You can configure outputs through YAML, but you can't easily customize how agents reason, modify the critique criteria, or add new agent types. The extensibility story isn't clear. There's also the fundamental question of quality: iterative refinement by AI critics can amplify biases or produce technically correct but intellectually shallow writing that reads well but says little.
Verdict
Use if: You're writing formal academic papers or technical documentation where structure and citations matter, you're comfortable with paid API dependencies, and you want to offload the painful first-draft generation while maintaining editorial control through iterative feedback. The advisor-student paradigm genuinely fits the academic workflow, and the YAML configuration provides enough control for most research writing scenarios. Skip if: You need offline operation, want to use local or alternative LLM providers, require deep customization of agent logic beyond configuration files, work in domains requiring creative voice over formal structure, or need production-ready stability. For those cases, look at general frameworks like AutoGen or CrewAI for full control, or stick with traditional writing tools augmented by direct LLM interaction through Claude Projects or ChatGPT where you maintain the driver's seat.