Models Overview: Writing Agents and Their Foundations

This document provides an overview of the large language models used inside the Writers Factory. Each agent in the system is powered by a different model, which means they think, write, and respond differently. Understanding these differences will help you choose the right agent for the right task.

The purpose of the course is not to produce a perfect novel but to explore context engineering—learning how instructions, constraints, and narrative context shape what an AI can do. You can complete all exercises with any model, but exploring multiple models will deepen your understanding of how they vary.

Why Multiple Models?

Different language models have different strengths:

Some excel at narrative writing (tone, style, pacing).
Some are strong researchers, pulling together information and structure.
Some are more literal or analytical, making them good for outlining or checking coherence.
Some specialize by language, such as Russian-language support.

Because the Writers Factory is agent-based, each agent behaves differently depending on the model behind it. This variety is intentional.

Two “Tournaments”: Choosing Your Voices

As part of the course, you will run two small “tournaments” among your agents:

1. Narration & Voice Tournament

Early in the process, you will send identical prompts to several agents to compare how each one handles narration, tone, and stylistic choices. This helps you discover which model best matches the voice you want for your novel.

2. Scene-Writing Tournament

Later, during scene creation, you can give the same scene prompt to multiple agents. By comparing their interpretations—structure, detail, pacing—you get a clear sense of which agents perform best for specific types of scenes.

These tournaments are not about competition but exploration. They give you a practical feel for model differences and help you choose which agents should carry the main narrative load.

Model Access During the MVP Trial

The Writers Factory is designed to run Cloud-First for the best creative results.

Standard Track (Cloud):
- Bring Your Own Key: Required for premium US models (Claude Sonnet, GPT-4o, Grok).
- Best Value (Included): Gemini 2.0 Flash is provided with embedded keys - no setup required. It wins on value score (quality 9, ~$0.19/1M avg tokens) and the orchestrator automatically selects it for “balanced” tier tasks.
- Keys Included: Gemini 2.0 Flash + European/Asian models are provided for all course modules.
Expert Track (Local):
- Ollama Integration: Advanced users with powerful hardware (16GB+ RAM, NVIDIA GPU, or Apple Silicon) can run local models.
- This is now an opt-in feature found in Settings > Advanced > Expert Mode.
- Hardware Scan: The application will auto-detect your system specs and recommend safe models (e.g., stopping you from loading a 70B model on a laptop).

Full Model Index (by Region)

— US (Bring Your Own Key) —

OpenAI (GPT-4o, o1-preview)
Anthropic (Claude Sonnet 4)
xAI (Grok)

— MVP Trial (Keys Included) —

These models are provided free during the course trial:

Google (Gemini 2.0 Flash) - Best value score, default for balanced tier
DeepSeek (DeepSeek V3 / R1)
Alibaba (Qwen 2.5)
Moonshot (Kimi)
Zhipu AI (ChatGLM)
Mistral AI (Mistral Large, Pixtral)
Yandex AI (YandexGPT 5.1 Pro) - Best for Russian

— Local Models (Advanced) —

For privacy or offline use, you can run models on your own machine.

Setup: Go to Settings > Local Models to scan your hardware.
Smart Recommendations: The app will detect your RAM/GPU and only show models that fit your system (e.g., hiding 70B models on laptops).
Ollama Required: You must install Ollama first. The settings panel will guide you through this.

Common configurations:

Llama 3 (Standard 8B) - Requires ~5GB RAM
Mistral (Standard 7B) - Requires ~4GB RAM
Mixtral (High-End 8x7B) - Requires ~24GB RAM

Note: For the “One-Week Novel” course, we strongly recommend using a Cloud model (even a cheap one) over a small local model. 3B parameter models often struggle with narrative consistency.

Using Models for Writing, Research, and More

As you work through the course, you may find that:

Gemini 2.0 Flash is the default “balanced” model - best quality-per-dollar ratio.
Claude-based agents are strong narrative stylists (premium tier).
OpenAI-based agents provide strong reasoning and structured output (premium tier).
DeepSeek/Qwen offer competitive capabilities for analysis and planning.
Yandex is the best option for Russian-language writing.

The orchestrator automatically selects the optimal model based on your quality tier setting.

Why This Matters for the One-Week Novel

The course uses these models to demonstrate how much of writing is actually structural thinking and context design.

The novel you create is secondary—the real objective is mastering context engineering through deliberate experiments with multiple agents.

For Engineers: Model Capability System

Writers Factory uses a capability matrix to intelligently route tasks to appropriate models. This enables graceful fallback and cost optimization.

Key Files

File	Purpose
`backend/services/model_capabilities.py`	Capability definitions, tier system, fallback logic
`backend/services/model_orchestrator.py`	Task routing, quality tier selection
`backend/services/llm_service.py`	Unified LLM abstraction layer
`agents.yaml`	Agent-to-model assignments

Capability Matrix Structure

# From model_capabilities.py
MODEL_CAPABILITIES = {
    "claude-sonnet-4-5": {
        "context_window": 200000,
        "prompt_tier": "full",           # full | medium | minimal
        "supports_xml": True,
        "supports_json": True,
        "quality_tier": "premium",       # budget | balanced | premium
        "strengths": ["narrative", "reasoning", "instruction_following"],
        "cost_per_1k_tokens": 0.003,
    },
    "llama3.2:3b": {
        "context_window": 32000,
        "prompt_tier": "minimal",
        "supports_xml": False,
        "quality_tier": "budget",
        "strengths": ["fast", "local"],
        "cost_per_1k_tokens": 0.0,
    },
    # ... more models
}

Prompt Tier Assembly

Different models receive different prompt complexity:

Tier	Context Budget	Features Included
full	128K+	Complete 7-layer sandwich, all guardrails, full voice bundle
medium	32K-128K	Summarized process map, core protocols, condensed voice
minimal	<32K	Identity + mode only, compressed, minimal KB entries

Quality Tier Routing

The Model Orchestrator routes tasks based on requirements using a value score formula:

# Value score = quality_score / avg_cost
# avg_cost = (cost_per_1m_input * 0.7 + cost_per_1m_output * 0.3)
#
# Example results:
# - Gemini 2.0 Flash: 9 / 0.19 = 47.4 ✓ (wins balanced tier)
# - DeepSeek V3: 9 / 0.52 = 17.3

# Example: Select model for outline analysis
from backend.services.model_orchestrator import orchestrator, SelectionCriteria

criteria = SelectionCriteria(
    task_type="structural_planning",
    quality_tier="balanced"  # or "budget", "premium", "opus"
)
model_id = orchestrator.select_model(criteria)
# Returns: "gemini-2.0-flash-exp" for balanced tier

Adding a New Model

Add capability entry to MODEL_CAPABILITIES in model_capabilities.py
Add provider integration to llm_service.py (if new provider)
Configure in agents.yaml for agent assignments
Test with pytest backend/tests/test_model_capabilities.py

API Endpoints

Endpoint	Method	Description
`/orchestrator/capabilities`	GET	List all model capabilities
`/orchestrator/tier-info`	GET	Get dynamic tier info with selected models
`/orchestrator/recommendations/{task}`	GET	Get model recommendations per tier
`/orchestrator/estimate-cost`	POST	Estimate monthly cost for a tier
`/orchestrator/route-chat`	POST	Route chat message to optimal model
`/orchestrator/current-spend`	GET	Get current month spending
`/api-keys/status`	GET	Check API key availability

For more on how models integrate with agents, see Agent Instructions