Scene Scaffolding: How Writers Factory Builds Context
For Students: This document explains how Writers Factory solves one of the hardest problems in AI-assisted fiction writing: giving the LLM enough context to write a coherent scene without overwhelming it or losing important details.
The Problem: LLMs Are “Amnesiac Geniuses”
Large Language Models are brilliant at writing prose but have no persistent memory. Each API call is stateless. When you ask an LLM to write Scene 5 of your novel, it doesn’t know:
- Who your protagonist is
- What happened in Scenes 1-4
- What your writing voice sounds like
- What the current story beat requires
- What research you’ve done
The naive solution - stuffing everything into the prompt - fails because:
- Context windows have limits (even 200K tokens isn’t enough for a novel)
- LLMs suffer from “lost in the middle” - they focus on the beginning and end, losing middle content
- Unstructured context leads to inconsistent results
Writers Factory’s solution: A scaffold - a structured briefing document that gives the LLM exactly what it needs, organized for optimal comprehension.
The Scaffold: A Strategic Briefing Document
A scaffold is NOT the scene itself. It’s a strategic briefing that tells the LLM:
- What this scene must accomplish (beat function)
- Who appears and their story roles
- What happened before (continuity)
- What voice to use
- What research to draw from
Example Scaffold Structure
## SCENE 1.3: THE FORUM CONFRONTATION
**For Writers Factory Scene Writer**
---
## CHAPTER OVERVIEW
- **Target:** ~2000 words
- **Phase:** Act I: Setup
- **Voice:** Detached intellectual observing chaos he created
## BEAT CONTEXT
- **Beat:** Catalyst (12%)
- **Function:** Disrupt protagonist's ordinary world
- **Type:** false_defeat
## STRATEGIC CONTEXT
- **Core Function:** Serve Catalyst beat: External event forces protagonist to act
- **Conflict:** Lukas's logical worldview vs Marina's emotional challenge
- **Character Goals:** Lukas seeks validation; Marina seeks to expose
## CONTINUITY
**Callbacks:** Opening scene tension, algorithm introduction
**Foreshadowing:** Marina's recording device (setup for Act II)
## CHARACTER ENSEMBLE
### Lukas (Protagonist)
- **Fatal Flaw:** Lacks self-awareness
- **The Lie:** Logic is sufficient for all problems
- **Arc State:** Pre-transformation, fully believes the lie
### Marina (Ally)
- **Role Function:** Voice of Dissent challenges protagonist's assumptions
- **Relationship to Protagonist:** Intellectual friction, mutual respect beneath conflict
- **Core Belief:** Systems must serve humans, not control them
## CHARACTER USAGE INSTRUCTIONS (CRITICAL)
**Protagonist Naming:**
- The protagonist is **Lukas** - they MUST be addressed by name via dialogue
- Other characters should call them "Lukas" or use their name in conversation
**Character Demonstration Requirements:**
- If **Marina** (ally) appears: show belief: "Systems must serve humans, not control them"
- If **Sara** (antagonist) appears: show belief: "If the system allows it, it must be right"
**Anti-Pattern: Passive Presence**
- Characters must DO something that reveals who they are
- Nodding, watching, or silent presence is NOT enough
## PRIOR SCENE SUMMARIES
*Use this context to maintain continuity with earlier scenes.*
### Scene 1
Lukas delivers opening remarks at the Civic Forum...
**Characters:** Lukas, Sara, Dr. Duit
**Key Events:**
- Lukas dismisses emotional concerns as "irrational"
- Algorithm flags attendees for "emotional instability"
### Scene 2
Marina arrives late, observing from the back...
**Characters:** Marina, Rolph
**Key Events:**
- Marina records Lukas's speech on hidden device
- Rolph shows signs of algorithm dependency
## CODEX SNIPPETS (FROM YOUR RESEARCH)
**Query:** "forum public speaking algorithm governance"
> "The Civic Forum was designed as a space for rational discourse, but the
> algorithm's real-time emotional monitoring transformed it into a performance
> evaluated by machine metrics rather than human understanding."
— Source: world_building_notes.md
## RESEARCH INGREDIENTS (RESEARCH GRAPH)
**Characters:** Lukas (The Intellectual archetype), Marina (Voice of Dissent)
**World Elements:** Algorithm governance, social credit implications
**Themes:** Technology vs humanity, self-awareness
## VOICE CALIBRATION
[Gold standard examples and anti-patterns from voice tournament...]
## REQUIREMENTS
- Target: ~2000 words
- Voice: Character actively observing/thinking, NOT AI explaining character
- Metaphors: Rotate domains, avoid saturation, no similes
- Anti-patterns: NO "with [adjective] precision", NO computer psychology
- **NAMING**: Protagonist must be called by name at least once via dialogue
- **CHARACTER ROLES**: Each named character must demonstrate their story function
How the Scaffold is Built: The Data Flow
1. Beat Selection (Frontend → Backend)
When you click a beat in the Binder, the frontend sends:
// DirectorDropdown.svelte
const response = await apiClient.generateScaffold({
chapter_number: 1,
scene_number: 3,
beat_number: 2,
beat_name: "Catalyst",
beat_description: "External event forces protagonist to act"
});
2. Scaffold Generation (scaffold_generator_service.py)
The ScaffoldGeneratorService orchestrates multiple data sources:
async def generate_full_scaffold(
self,
project_id: str,
chapter_number: int,
scene_number: int,
beat_info: BeatInfo,
...
) -> Scaffold:
# 1. Load character ensemble from Story Bible
character_ensemble = []
try:
from backend.services.story_bible_service import StoryBibleService
bible_service = StoryBibleService(path_manager.content_dir)
ensemble = bible_service.get_character_ensemble()
# Returns: [{"name": "Lukas", "role": "protagonist", "fatal_flaw": "...", ...}]
except Exception as e:
logger.warning(f"Could not load character ensemble: {e}")
# 2. Load Codex snippets (semantic search of your research)
codex_snippets = []
try:
from backend.services.knowledge_router import KnowledgeRouter
router = KnowledgeRouter()
codex_query = f"{beat_info.beat_name} {scene_description}"
snippets = router.query_codex(codex_query, top_k=5)
# Returns embedded chunks from your research documents
except Exception as e:
logger.warning(f"Could not query Codex: {e}")
# 3. Load Research Graph ingredients
research_ingredients = None
try:
from backend.services.research_graph_service import ResearchGraphService
rg_service = ResearchGraphService()
research_ingredients = rg_service.get_scene_ingredients(project_id)
# Returns: characters, worlds, themes from Research Graph
except Exception as e:
logger.warning(f"Could not load research ingredients: {e}")
# 4. Load prior scene summaries (Task 90)
prior_scene_context = ""
try:
from backend.services.scene_summary_service import get_scene_summary_service
summary_service = get_scene_summary_service()
prior_summaries = summary_service.get_prior_summaries(
project_id="default_project",
before_scene_number=scene_number # Uses global ordinal!
)
if prior_summaries:
prior_scene_context = summary_service.format_summaries_for_prompt(prior_summaries)
except Exception as e:
logger.warning(f"Could not load prior scene summaries: {e}")
# 5. Assemble the Scaffold dataclass
return Scaffold(
scene_id=f"ch{chapter_number}-sc{scene_number}",
chapter_number=chapter_number,
scene_number=scene_number,
beat_info=beat_info,
character_ensemble=character_ensemble,
codex_snippets=codex_snippets,
research_ingredients=research_ingredients,
prior_scene_context=prior_scene_context,
...
)
3. Prompt Assembly (scene_writer_service.py)
The SceneWriterService converts the Scaffold into an LLM prompt:
def _build_scene_prompt(self, scaffold: Scaffold, voice_context: str, ...) -> str:
"""Assemble the complete prompt from scaffold components."""
base_prompt = f"""## SCENE {scaffold.chapter_number}.{scaffold.scene_number}
...
{self._build_character_ensemble_section(scaffold)}
{self._build_character_usage_instructions(scaffold)}
{self._build_prior_scene_context(scaffold)}
{self._build_codex_section(scaffold)}
{self._build_research_ingredients_section(scaffold)}
{voice_context}
## REQUIREMENTS
- Target: ~{target_word_count} words
- Voice: Character actively observing/thinking, NOT AI explaining character
...
"""
return base_prompt
Each _build_* method extracts and formats one aspect of context:
def _build_character_usage_instructions(self, scaffold: Scaffold) -> str:
"""Task 89: Tell LLM HOW to use character data."""
if not scaffold.character_ensemble:
return ""
lines = ["## CHARACTER USAGE INSTRUCTIONS (CRITICAL)", ""]
# Find protagonist and require naming
protagonist = next(
(c for c in scaffold.character_ensemble if c.get("role") == "protagonist"),
None
)
if protagonist:
name = protagonist.get("name", "the protagonist")
lines.append(f"**Protagonist Naming:**")
lines.append(f"- The protagonist is **{name}** - they MUST be addressed by name")
# For each supporting character, create demonstration instruction
for char in scaffold.character_ensemble:
if char.get("role") == "protagonist":
continue
name = char.get("name", "Unknown")
role = char.get("role", "supporting")
# Use whatever data we have to create specific instruction
if char.get("core_belief"):
demo = f'show belief: "{char["core_belief"]}"'
elif char.get("relationship_to_protagonist"):
demo = f'demonstrate: {char["relationship_to_protagonist"].split(".")[0]}'
else:
demo = f'demonstrate their {role} role through action or dialogue'
lines.append(f"- If **{name}** ({role}) appears: {demo}")
return "\n".join(lines)
4. Scene Generation (Multi-Model Tournament)
The assembled prompt goes to multiple LLMs simultaneously:
async def generate_scene_variants(self, scaffold: Scaffold, ...) -> List[SceneVariant]:
"""Generate scene variants using tournament approach."""
# Build the prompt once
prompt = self._build_scene_prompt(scaffold, voice_context, ...)
# Send to multiple models in parallel
tasks = []
for provider, model in self.tournament_models:
for strategy in [ACTION, CHARACTER, DIALOGUE, ATMOSPHERIC, BALANCED]:
tasks.append(self._generate_single_variant(
provider, model, strategy, prompt
))
# Collect results
variants = await asyncio.gather(*tasks)
# Auto-score all variants
for variant in variants:
variant.score = await self.scene_analyzer.analyze(variant.content)
return sorted(variants, key=lambda v: v.score, reverse=True)
The Continuity Problem: Scene Numbering
The Bug We Fixed
Scene summaries are stored with a scene_number for ordering. The original code:
# BROKEN: Only extracts scene portion
def _extract_scene_number(self, scene_id: str) -> int:
match = re.search(r'sc(\d+)', scene_id)
if match:
return int(match.group(1)) # act1_ch01_sc01 -> 1
# act1_ch02_sc01 -> 1 COLLISION!
This caused act1_ch01_sc01 and act1_ch02_sc01 to both have scene_number=1, breaking continuity queries.
The Fix: Global Ordinal
# FIXED: Computes global ordinal across acts/chapters
def _extract_scene_number(self, scene_id: str) -> int:
"""
Formula: (act-1)*10000 + (chapter-1)*100 + scene
Examples:
act1_ch01_sc01 -> 1 (0*10000 + 0*100 + 1)
act1_ch01_sc02 -> 2 (0*10000 + 0*100 + 2)
act1_ch02_sc01 -> 101 (0*10000 + 1*100 + 1) <- Now distinct!
act2_ch01_sc01 -> 10001 (1*10000 + 0*100 + 1)
"""
match = re.search(r'act(\d+)_ch(\d+)_sc(\d+)', scene_id)
if match:
act = int(match.group(1))
chapter = int(match.group(2))
scene = int(match.group(3))
return (act - 1) * 10000 + (chapter - 1) * 100 + scene
Now querying scene_number < 101 correctly returns all Chapter 1 scenes.
Key Design Principles
1. Structured Over Stuffed
Don’t dump raw text. Organize context into labeled sections the LLM can reference.
2. Instructions Over Data
Data alone isn’t enough. The CHARACTER USAGE INSTRUCTIONS section tells the LLM how to use the character data, not just what the data is.
3. Graceful Degradation
Every data source is wrapped in try/except. If Codex fails, the scene still generates - just without research snippets.
try:
codex_snippets = router.query_codex(query, top_k=5)
except Exception as e:
logger.warning(f"Codex unavailable: {e}")
codex_snippets = [] # Continue without Codex
4. Provenance Tracking
Every piece of context knows where it came from:
class CodexSnippet:
content: str
source_file: str
relevance_score: float
class ResearchIngredient:
name: str
source_notebook: str
source_project: str
5. The “Lost in the Middle” Fix
Critical instructions go at the END of the prompt (REQUIREMENTS, GENRE CALIBRATION) because LLMs attend most to beginnings and endings.
Database Schema: Where Context Lives
writers_factory.db (Story Graph - Canonical Truth)
├── scenes - Promoted scenes with beat linkage
├── scene_summaries - LLM-extracted summaries for continuity (Task 90)
├── chapters - Chapter metadata
└── beats - Beat sheet structure
research_graph.db (Research Graph - Creative Ingredients)
├── ingredients - Characters, worlds, themes, plot beats
└── provenance - Source tracking for each ingredient
sessions.db (Workspace State)
├── sessions - Chat history
├── foreman_kb - Crystallized decisions
└── settings - User configuration
The Implementation Journey: Tasks 87-94
Building the scaffold prompt was an iterative process. Each task addressed a specific gap between what we wanted and what the LLM actually received.
Task 87: Character Ensemble Injection
Problem: Scene prompts had no character context. The LLM didn’t know who was the protagonist, antagonist, or what their relationships were.
Solution: Parse Story Bible files (Characters/, Relationships_Web.md) and inject a CHARACTER ENSEMBLE section.
Files Modified:
story_bible_service.py- Addedget_character_ensemble()methodscaffold_generator_service.py- Addedcharacter_ensemblefield to Scaffoldscene_writer_service.py- Added_build_character_ensemble_section()
Task 88: Beat Enrichment Pipeline
Problem: Beat files were too thin. “Catalyst: External event forces protagonist to act” doesn’t tell the LLM which characters should appear or what should happen.
Solution: On-demand LLM enrichment that adds per-beat character assignments and scene guidance.
Key Insight: This runs BEFORE scene generation, during Voice Calibration, so beats are enriched by the time you reach Director Mode.
Task 89: Character Usage Instructions
Problem: Characters were in the prompt but appeared passively. Sara would “nod approvingly” instead of demonstrating her belief that “if the system allows it, it must be right.”
Solution: Explicit instructions telling the LLM HOW to use character data, not just providing data.
Key Pattern:
## CHARACTER USAGE INSTRUCTIONS (CRITICAL)
**Protagonist Naming:**
- The protagonist is **Lukas** - they MUST be addressed by name via dialogue
**Character Demonstration Requirements:**
- If **Marina** (ally) appears: show belief: "Systems must serve humans"
- If **Sara** (antagonist) appears: show belief: "If the system allows it, it must be right"
Task 90: Prior Scene Summaries
Problem: No continuity. The LLM writing Scene 5 didn’t know what happened in Scenes 1-4.
Solution: Extract and store summaries when scenes are “promoted” (approved), then inject them into subsequent scaffolds.
Database Table: scene_summaries with scene_number (global ordinal), characters, key_events, summary_text.
Task 91: Inter-Character Relationships
Problem: Q7 answers like “love interest is Sara” and “Marina is Sara’s mother” were being parsed but lost before reaching Relationships_Web.md.
Solution: Add skeleton_role, related_to, relationship_type fields and trace them through the entire pipeline:
Q7 parsing → CharacterSeed → CharacterIngredient (DB) → collision_to_seed() → Relationships_Web.md
Task 92: Scene ID Collision Bug
Problem: Beat 2 was overwriting Scene 1! Both generated act1_ch01_sc01 because generateSceneId() was checking the old Working/ folder (always empty after Task 84 renamed it to Scenes/).
Root Cause: One-line bug in DirectorDropdown.svelte.
Fix: listScenes('Working', ...) → listScenes('Scenes', ...)
Task 93: Q7 Natural Language Parsing
Problem: Q7 parsing patterns didn’t match natural language:
- “love interest IS Sara” → no match (expected “love interest Sara”)
- “between Lukas and his father, Geoff” → no match
Solution: Add three new regex patterns to handle “is”, “between…and”, and “has…with” constructions.
Task 94: Beat Context Section
Problem: The example prompt showed a ## BEAT CONTEXT section with beat name, percentage, function, and type. But the actual implementation only baked beat info into the core_function string—losing the structured fields.
Gap Analysis:
# What we had:
core_function=f"Serve {beat_info.beat_name} beat: {beat_info.description}"
# What the LLM needed:
## BEAT CONTEXT
- **Beat:** Catalyst (12%)
- **Function:** Disrupt protagonist's ordinary world
- **Type:** false_defeat
Solution:
- Add
beat_name,beat_percentage,beat_type,beat_functiontoScaffolddataclass - Populate them from
BeatInfoduring scaffold generation - Add
_build_beat_context_section()method - Inject after SCENE CONTEXT in prompt
Why It Matters: The beat type (false_victory, false_defeat) tells the LLM the emotional trajectory. A Midpoint that’s a false victory needs very different prose than one that’s a false defeat.
Lessons for Students
1. Document First, Then Compare
The example prompt in this document was written BEFORE the implementation was complete. By comparing the ideal prompt to the actual code output, we found gaps (like Task 94’s missing BEAT CONTEXT).
Technique: Write the prompt you WANT the LLM to receive, then trace through the code to see what it ACTUALLY receives.
2. Data Flow Is Everything
Most bugs in Tasks 91-93 were data flow issues—fields were parsed correctly but lost somewhere in the pipeline. Tracing Q7 → parsing → dataclass → database → API → generator → prompt revealed where fields were dropped.
3. Small Bugs, Big Impact
Task 92 was a one-line fix ('Working' → 'Scenes'), but it caused scenes to overwrite each other. Never underestimate path/folder changes.
4. Graceful Degradation
Every data source is wrapped in try/except:
try:
codex_snippets = router.query_codex(query)
except Exception:
codex_snippets = [] # Continue without Codex
The system should work with partial data, not fail if one source is unavailable.
Testing Your Understanding
- Why do we need CHARACTER USAGE INSTRUCTIONS separate from CHARACTER ENSEMBLE?
- ENSEMBLE provides data; INSTRUCTIONS tell the LLM how to use it
- Without instructions, characters appear passively (nodding, watching)
- What’s the global ordinal formula and why does it matter?
(act-1)*10000 + (chapter-1)*100 + scene- Ensures scenes from different chapters have distinct, ordered numbers
- Why is voice context placed near the end of the prompt?
- LLMs suffer from “lost in the middle” - they attend to start and end
- Critical style guidance needs to be in high-attention zones
- What happens if the Codex query fails?
- Scene generation continues without research snippets
- Graceful degradation ensures the system never blocks on optional features
- Why does beat type (false_victory/false_defeat) matter?
- It tells the LLM the emotional trajectory of the scene
- A Midpoint false victory (“everything is going great”) needs different prose than a false defeat (“first major setback”)
- What technique helped identify Task 94?
- Comparing the documented “ideal prompt” to the actual code output
- The BEAT CONTEXT section was in the docs but not in the implementation
Document created: 2026-01-18 Updated: 2026-01-19 with Tasks 91-94 implementation journey Based on Tasks 87-94 implementation by Claude Code and Claude Cowork (Opus 4.5)