Scene Scaffolding: How Writers Factory Builds Context

For Students: This document explains how Writers Factory solves one of the hardest problems in AI-assisted fiction writing: giving the LLM enough context to write a coherent scene without overwhelming it or losing important details.

The Problem: LLMs Are “Amnesiac Geniuses”

Large Language Models are brilliant at writing prose but have no persistent memory. Each API call is stateless. When you ask an LLM to write Scene 5 of your novel, it doesn’t know:

Who your protagonist is
What happened in Scenes 1-4
What your writing voice sounds like
What the current story beat requires
What research you’ve done

The naive solution - stuffing everything into the prompt - fails because:

Context windows have limits (even 200K tokens isn’t enough for a novel)
LLMs suffer from “lost in the middle” - they focus on the beginning and end, losing middle content
Unstructured context leads to inconsistent results

Writers Factory’s solution: A scaffold - a structured briefing document that gives the LLM exactly what it needs, organized for optimal comprehension.

The Scaffold: A Strategic Briefing Document

A scaffold is NOT the scene itself. It’s a strategic briefing that tells the LLM:

What this scene must accomplish (beat function)
Who appears and their story roles
What happened before (continuity)
What voice to use
What research to draw from

Example Scaffold Structure

## SCENE 1.3: THE FORUM CONFRONTATION

**For Writers Factory Scene Writer**

---

## CHAPTER OVERVIEW
- **Target:** ~2000 words
- **Phase:** Act I: Setup
- **Voice:** Detached intellectual observing chaos he created

## BEAT CONTEXT
- **Beat:** Catalyst (12%)
- **Function:** Disrupt protagonist's ordinary world
- **Type:** false_defeat

## STRATEGIC CONTEXT
- **Core Function:** Serve Catalyst beat: External event forces protagonist to act
- **Conflict:** Lukas's logical worldview vs Marina's emotional challenge
- **Character Goals:** Lukas seeks validation; Marina seeks to expose

## CONTINUITY
**Callbacks:** Opening scene tension, algorithm introduction
**Foreshadowing:** Marina's recording device (setup for Act II)

## CHARACTER ENSEMBLE
### Lukas (Protagonist)
- **Fatal Flaw:** Lacks self-awareness
- **The Lie:** Logic is sufficient for all problems
- **Arc State:** Pre-transformation, fully believes the lie

### Marina (Ally)
- **Role Function:** Voice of Dissent challenges protagonist's assumptions
- **Relationship to Protagonist:** Intellectual friction, mutual respect beneath conflict
- **Core Belief:** Systems must serve humans, not control them

## CHARACTER USAGE INSTRUCTIONS (CRITICAL)
**Protagonist Naming:**
- The protagonist is **Lukas** - they MUST be addressed by name via dialogue
- Other characters should call them "Lukas" or use their name in conversation

**Character Demonstration Requirements:**
- If **Marina** (ally) appears: show belief: "Systems must serve humans, not control them"
- If **Sara** (antagonist) appears: show belief: "If the system allows it, it must be right"

**Anti-Pattern: Passive Presence**
- Characters must DO something that reveals who they are
- Nodding, watching, or silent presence is NOT enough

## PRIOR SCENE SUMMARIES
*Use this context to maintain continuity with earlier scenes.*

### Scene 1
Lukas delivers opening remarks at the Civic Forum...
**Characters:** Lukas, Sara, Dr. Duit
**Key Events:**
- Lukas dismisses emotional concerns as "irrational"
- Algorithm flags attendees for "emotional instability"

### Scene 2
Marina arrives late, observing from the back...
**Characters:** Marina, Rolph
**Key Events:**
- Marina records Lukas's speech on hidden device
- Rolph shows signs of algorithm dependency

## CODEX SNIPPETS (FROM YOUR RESEARCH)
**Query:** "forum public speaking algorithm governance"
> "The Civic Forum was designed as a space for rational discourse, but the
> algorithm's real-time emotional monitoring transformed it into a performance
> evaluated by machine metrics rather than human understanding."
— Source: world_building_notes.md

## RESEARCH INGREDIENTS (RESEARCH GRAPH)
**Characters:** Lukas (The Intellectual archetype), Marina (Voice of Dissent)
**World Elements:** Algorithm governance, social credit implications
**Themes:** Technology vs humanity, self-awareness

## VOICE CALIBRATION
[Gold standard examples and anti-patterns from voice tournament...]

## REQUIREMENTS
- Target: ~2000 words
- Voice: Character actively observing/thinking, NOT AI explaining character
- Metaphors: Rotate domains, avoid saturation, no similes
- Anti-patterns: NO "with [adjective] precision", NO computer psychology
- **NAMING**: Protagonist must be called by name at least once via dialogue
- **CHARACTER ROLES**: Each named character must demonstrate their story function

How the Scaffold is Built: The Data Flow

1. Beat Selection (Frontend → Backend)

When you click a beat in the Binder, the frontend sends:

// DirectorDropdown.svelte
const response = await apiClient.generateScaffold({
  chapter_number: 1,
  scene_number: 3,
  beat_number: 2,
  beat_name: "Catalyst",
  beat_description: "External event forces protagonist to act"
});

2. Scaffold Generation (scaffold_generator_service.py)

The ScaffoldGeneratorService orchestrates multiple data sources:

async def generate_full_scaffold(
    self,
    project_id: str,
    chapter_number: int,
    scene_number: int,
    beat_info: BeatInfo,
    ...
) -> Scaffold:

    # 1. Load character ensemble from Story Bible
    character_ensemble = []
    try:
        from backend.services.story_bible_service import StoryBibleService
        bible_service = StoryBibleService(path_manager.content_dir)
        ensemble = bible_service.get_character_ensemble()
        # Returns: [{"name": "Lukas", "role": "protagonist", "fatal_flaw": "...", ...}]
    except Exception as e:
        logger.warning(f"Could not load character ensemble: {e}")

    # 2. Load Codex snippets (semantic search of your research)
    codex_snippets = []
    try:
        from backend.services.knowledge_router import KnowledgeRouter
        router = KnowledgeRouter()
        codex_query = f"{beat_info.beat_name} {scene_description}"
        snippets = router.query_codex(codex_query, top_k=5)
        # Returns embedded chunks from your research documents
    except Exception as e:
        logger.warning(f"Could not query Codex: {e}")

    # 3. Load Research Graph ingredients
    research_ingredients = None
    try:
        from backend.services.research_graph_service import ResearchGraphService
        rg_service = ResearchGraphService()
        research_ingredients = rg_service.get_scene_ingredients(project_id)
        # Returns: characters, worlds, themes from Research Graph
    except Exception as e:
        logger.warning(f"Could not load research ingredients: {e}")

    # 4. Load prior scene summaries (Task 90)
    prior_scene_context = ""
    try:
        from backend.services.scene_summary_service import get_scene_summary_service
        summary_service = get_scene_summary_service()
        prior_summaries = summary_service.get_prior_summaries(
            project_id="default_project",
            before_scene_number=scene_number  # Uses global ordinal!
        )
        if prior_summaries:
            prior_scene_context = summary_service.format_summaries_for_prompt(prior_summaries)
    except Exception as e:
        logger.warning(f"Could not load prior scene summaries: {e}")

    # 5. Assemble the Scaffold dataclass
    return Scaffold(
        scene_id=f"ch{chapter_number}-sc{scene_number}",
        chapter_number=chapter_number,
        scene_number=scene_number,
        beat_info=beat_info,
        character_ensemble=character_ensemble,
        codex_snippets=codex_snippets,
        research_ingredients=research_ingredients,
        prior_scene_context=prior_scene_context,
        ...
    )

3. Prompt Assembly (scene_writer_service.py)

The SceneWriterService converts the Scaffold into an LLM prompt:

def _build_scene_prompt(self, scaffold: Scaffold, voice_context: str, ...) -> str:
    """Assemble the complete prompt from scaffold components."""

    base_prompt = f"""## SCENE {scaffold.chapter_number}.{scaffold.scene_number}
...
{self._build_character_ensemble_section(scaffold)}

{self._build_character_usage_instructions(scaffold)}

{self._build_prior_scene_context(scaffold)}

{self._build_codex_section(scaffold)}

{self._build_research_ingredients_section(scaffold)}

{voice_context}

## REQUIREMENTS
- Target: ~{target_word_count} words
- Voice: Character actively observing/thinking, NOT AI explaining character
...
"""
    return base_prompt

Each _build_* method extracts and formats one aspect of context:

def _build_character_usage_instructions(self, scaffold: Scaffold) -> str:
    """Task 89: Tell LLM HOW to use character data."""
    if not scaffold.character_ensemble:
        return ""

    lines = ["## CHARACTER USAGE INSTRUCTIONS (CRITICAL)", ""]

    # Find protagonist and require naming
    protagonist = next(
        (c for c in scaffold.character_ensemble if c.get("role") == "protagonist"),
        None
    )
    if protagonist:
        name = protagonist.get("name", "the protagonist")
        lines.append(f"**Protagonist Naming:**")
        lines.append(f"- The protagonist is **{name}** - they MUST be addressed by name")

    # For each supporting character, create demonstration instruction
    for char in scaffold.character_ensemble:
        if char.get("role") == "protagonist":
            continue

        name = char.get("name", "Unknown")
        role = char.get("role", "supporting")

        # Use whatever data we have to create specific instruction
        if char.get("core_belief"):
            demo = f'show belief: "{char["core_belief"]}"'
        elif char.get("relationship_to_protagonist"):
            demo = f'demonstrate: {char["relationship_to_protagonist"].split(".")[0]}'
        else:
            demo = f'demonstrate their {role} role through action or dialogue'

        lines.append(f"- If **{name}** ({role}) appears: {demo}")

    return "\n".join(lines)

4. Scene Generation (Multi-Model Tournament)

The assembled prompt goes to multiple LLMs simultaneously:

async def generate_scene_variants(self, scaffold: Scaffold, ...) -> List[SceneVariant]:
    """Generate scene variants using tournament approach."""

    # Build the prompt once
    prompt = self._build_scene_prompt(scaffold, voice_context, ...)

    # Send to multiple models in parallel
    tasks = []
    for provider, model in self.tournament_models:
        for strategy in [ACTION, CHARACTER, DIALOGUE, ATMOSPHERIC, BALANCED]:
            tasks.append(self._generate_single_variant(
                provider, model, strategy, prompt
            ))

    # Collect results
    variants = await asyncio.gather(*tasks)

    # Auto-score all variants
    for variant in variants:
        variant.score = await self.scene_analyzer.analyze(variant.content)

    return sorted(variants, key=lambda v: v.score, reverse=True)

The Continuity Problem: Scene Numbering

The Bug We Fixed

Scene summaries are stored with a scene_number for ordering. The original code:

# BROKEN: Only extracts scene portion
def _extract_scene_number(self, scene_id: str) -> int:
    match = re.search(r'sc(\d+)', scene_id)
    if match:
        return int(match.group(1))  # act1_ch01_sc01 -> 1
                                     # act1_ch02_sc01 -> 1  COLLISION!

This caused act1_ch01_sc01 and act1_ch02_sc01 to both have scene_number=1, breaking continuity queries.

The Fix: Global Ordinal

# FIXED: Computes global ordinal across acts/chapters
def _extract_scene_number(self, scene_id: str) -> int:
    """
    Formula: (act-1)*10000 + (chapter-1)*100 + scene

    Examples:
        act1_ch01_sc01 -> 1       (0*10000 + 0*100 + 1)
        act1_ch01_sc02 -> 2       (0*10000 + 0*100 + 2)
        act1_ch02_sc01 -> 101     (0*10000 + 1*100 + 1)  <- Now distinct!
        act2_ch01_sc01 -> 10001   (1*10000 + 0*100 + 1)
    """
    match = re.search(r'act(\d+)_ch(\d+)_sc(\d+)', scene_id)
    if match:
        act = int(match.group(1))
        chapter = int(match.group(2))
        scene = int(match.group(3))
        return (act - 1) * 10000 + (chapter - 1) * 100 + scene

Now querying scene_number < 101 correctly returns all Chapter 1 scenes.

Key Design Principles

1. Structured Over Stuffed

Don’t dump raw text. Organize context into labeled sections the LLM can reference.

2. Instructions Over Data

Data alone isn’t enough. The CHARACTER USAGE INSTRUCTIONS section tells the LLM how to use the character data, not just what the data is.

3. Graceful Degradation

Every data source is wrapped in try/except. If Codex fails, the scene still generates - just without research snippets.

try:
    codex_snippets = router.query_codex(query, top_k=5)
except Exception as e:
    logger.warning(f"Codex unavailable: {e}")
    codex_snippets = []  # Continue without Codex

4. Provenance Tracking

Every piece of context knows where it came from:

class CodexSnippet:
    content: str
    source_file: str
    relevance_score: float

class ResearchIngredient:
    name: str
    source_notebook: str
    source_project: str

5. The “Lost in the Middle” Fix

Critical instructions go at the END of the prompt (REQUIREMENTS, GENRE CALIBRATION) because LLMs attend most to beginnings and endings.

Database Schema: Where Context Lives

writers_factory.db (Story Graph - Canonical Truth)
├── scenes          - Promoted scenes with beat linkage
├── scene_summaries - LLM-extracted summaries for continuity (Task 90)
├── chapters        - Chapter metadata
└── beats           - Beat sheet structure

research_graph.db (Research Graph - Creative Ingredients)
├── ingredients     - Characters, worlds, themes, plot beats
└── provenance      - Source tracking for each ingredient

sessions.db (Workspace State)
├── sessions        - Chat history
├── foreman_kb      - Crystallized decisions
└── settings        - User configuration

The Implementation Journey: Tasks 87-94

Building the scaffold prompt was an iterative process. Each task addressed a specific gap between what we wanted and what the LLM actually received.

Task 87: Character Ensemble Injection

Problem: Scene prompts had no character context. The LLM didn’t know who was the protagonist, antagonist, or what their relationships were.

Solution: Parse Story Bible files (Characters/, Relationships_Web.md) and inject a CHARACTER ENSEMBLE section.

Files Modified:

story_bible_service.py - Added get_character_ensemble() method
scaffold_generator_service.py - Added character_ensemble field to Scaffold
scene_writer_service.py - Added _build_character_ensemble_section()

Task 88: Beat Enrichment Pipeline

Problem: Beat files were too thin. “Catalyst: External event forces protagonist to act” doesn’t tell the LLM which characters should appear or what should happen.

Solution: On-demand LLM enrichment that adds per-beat character assignments and scene guidance.

Key Insight: This runs BEFORE scene generation, during Voice Calibration, so beats are enriched by the time you reach Director Mode.

Task 89: Character Usage Instructions

Problem: Characters were in the prompt but appeared passively. Sara would “nod approvingly” instead of demonstrating her belief that “if the system allows it, it must be right.”

Solution: Explicit instructions telling the LLM HOW to use character data, not just providing data.

Key Pattern:

## CHARACTER USAGE INSTRUCTIONS (CRITICAL)
**Protagonist Naming:**
- The protagonist is **Lukas** - they MUST be addressed by name via dialogue

**Character Demonstration Requirements:**
- If **Marina** (ally) appears: show belief: "Systems must serve humans"
- If **Sara** (antagonist) appears: show belief: "If the system allows it, it must be right"

Task 90: Prior Scene Summaries

Problem: No continuity. The LLM writing Scene 5 didn’t know what happened in Scenes 1-4.

Solution: Extract and store summaries when scenes are “promoted” (approved), then inject them into subsequent scaffolds.

Database Table: scene_summaries with scene_number (global ordinal), characters, key_events, summary_text.

Task 91: Inter-Character Relationships

Problem: Q7 answers like “love interest is Sara” and “Marina is Sara’s mother” were being parsed but lost before reaching Relationships_Web.md.

Solution: Add skeleton_role, related_to, relationship_type fields and trace them through the entire pipeline:

Q7 parsing → CharacterSeed → CharacterIngredient (DB) → collision_to_seed() → Relationships_Web.md

Task 92: Scene ID Collision Bug

Problem: Beat 2 was overwriting Scene 1! Both generated act1_ch01_sc01 because generateSceneId() was checking the old Working/ folder (always empty after Task 84 renamed it to Scenes/).

Root Cause: One-line bug in DirectorDropdown.svelte.

Fix: listScenes('Working', ...) → listScenes('Scenes', ...)

Task 93: Q7 Natural Language Parsing

Problem: Q7 parsing patterns didn’t match natural language:

“love interest IS Sara” → no match (expected “love interest Sara”)
“between Lukas and his father, Geoff” → no match

Solution: Add three new regex patterns to handle “is”, “between…and”, and “has…with” constructions.

Task 94: Beat Context Section

Problem: The example prompt showed a ## BEAT CONTEXT section with beat name, percentage, function, and type. But the actual implementation only baked beat info into the core_function string—losing the structured fields.

Gap Analysis:

# What we had:
core_function=f"Serve {beat_info.beat_name} beat: {beat_info.description}"

# What the LLM needed:
## BEAT CONTEXT
- **Beat:** Catalyst (12%)
- **Function:** Disrupt protagonist's ordinary world
- **Type:** false_defeat

Solution:

Add beat_name, beat_percentage, beat_type, beat_function to Scaffold dataclass
Populate them from BeatInfo during scaffold generation
Add _build_beat_context_section() method
Inject after SCENE CONTEXT in prompt

Why It Matters: The beat type (false_victory, false_defeat) tells the LLM the emotional trajectory. A Midpoint that’s a false victory needs very different prose than one that’s a false defeat.

Lessons for Students

1. Document First, Then Compare

The example prompt in this document was written BEFORE the implementation was complete. By comparing the ideal prompt to the actual code output, we found gaps (like Task 94’s missing BEAT CONTEXT).

Technique: Write the prompt you WANT the LLM to receive, then trace through the code to see what it ACTUALLY receives.

2. Data Flow Is Everything

Most bugs in Tasks 91-93 were data flow issues—fields were parsed correctly but lost somewhere in the pipeline. Tracing Q7 → parsing → dataclass → database → API → generator → prompt revealed where fields were dropped.

3. Small Bugs, Big Impact

Task 92 was a one-line fix ('Working' → 'Scenes'), but it caused scenes to overwrite each other. Never underestimate path/folder changes.

4. Graceful Degradation

Every data source is wrapped in try/except:

try:
    codex_snippets = router.query_codex(query)
except Exception:
    codex_snippets = []  # Continue without Codex

The system should work with partial data, not fail if one source is unavailable.

Testing Your Understanding

Why do we need CHARACTER USAGE INSTRUCTIONS separate from CHARACTER ENSEMBLE?
- ENSEMBLE provides data; INSTRUCTIONS tell the LLM how to use it
- Without instructions, characters appear passively (nodding, watching)
What’s the global ordinal formula and why does it matter?
- (act-1)*10000 + (chapter-1)*100 + scene
- Ensures scenes from different chapters have distinct, ordered numbers
Why is voice context placed near the end of the prompt?
- LLMs suffer from “lost in the middle” - they attend to start and end
- Critical style guidance needs to be in high-attention zones
What happens if the Codex query fails?
- Scene generation continues without research snippets
- Graceful degradation ensures the system never blocks on optional features
Why does beat type (false_victory/false_defeat) matter?
- It tells the LLM the emotional trajectory of the scene
- A Midpoint false victory (“everything is going great”) needs different prose than a false defeat (“first major setback”)
What technique helped identify Task 94?
- Comparing the documented “ideal prompt” to the actual code output
- The BEAT CONTEXT section was in the docs but not in the implementation

Document created: 2026-01-18 Updated: 2026-01-19 with Tasks 91-94 implementation journey Based on Tasks 87-94 implementation by Claude Code and Claude Cowork (Opus 4.5)