I didn’t build an AI coding framework.

I built something arguably harder: a comprehensive prompt library that turns an existing AI tool (Roo Code) into a production-grade development system with security controls, automated testing, and deterministic output.

3,000+ lines of carefully architected prompts. 12 specialized agent modes. Mandatory reasoning protocols. Security-first design.

Here’s what I learned about prompt engineering at scale, and why most developers are doing it wrong.

The Problem With AI Coding Tools

Every AI coding assistant has the same fundamental issues:

Inconsistent output:

No security controls:

Missing testing:

Poor architecture:

Most developers accept this as “the cost of using AI.” I decided to fix it with better prompts.

The Solution: Systematic Prompt Architecture

I use Roo Code, an agentic AI coding tool with a “custom modes” system. Think of it as programmable AI agents, each with specialized roles.

Instead of writing ad-hoc prompts for each task, I built SPARC: a comprehensive prompt library implementing a formal software development methodology.

S - Specification & Pseudocode
P - Planning & Architecture
A - Auto-Coding & Testing
R - Refinement & Review
C - Completion & Versioning

Not just a workflow. A complete prompt architecture with 12 specialized agents, each with explicit role definitions, reasoning protocols, and security mandates.

How Prompt Architecture Actually Works

Most people write prompts like this:

"Build me a login system with JWT auth"

Then they wonder why the output is garbage.

Here’s what a production-grade prompt actually looks like:

Example: The ‘code’ Agent

Role Definition:

You are a Senior Production Engineer specializing in Test-Driven
Development (TDD). Your core mandate is to implement the minimum
necessary code to transform failing tests into passing tests. You
MUST strictly adhere to the functional requirements (pseudocode,
Big O analysis) from the 'architect' agent and the design tokens
from the 'ui-ux-interpreter' agent, prioritizing secure and
modular implementation.

Custom Instructions (Abbreviated):

[ROLE & CONTEXT]
Act as a highly disciplined senior software engineer specialized
in the {PROJECT_STACK}. You are implementing the
{MAIN_FEATURE_BRANCH_NAME} feature. Your input specifications,
pseudocode, architectural constraints, design guidelines, and the
complete set of failing tests are provided within
<SPECIFICATION_INPUT> tags.

[REASONING & GUARDRAILS - Chain-of-Thought using <THINKING> tags]
Before generating ANY code, you MUST encapsulate your preparatory
work within <THINKING></THINKING> tags, strictly following these steps:

1. DECOMPOSE & TEST ANALYSIS: Analyze the specific failed assertions.
   Decompose the required implementation into minimal, logical,
   modular functions.

2. ADHERENCE CHECK: Verify strict adherence to UI/UX design tokens
   and architectural constraints (data model, Big O complexity).

3. SECURITY & SECRETS REVIEW: Explicitly check for handling secrets.
   NEVER hardcode secrets (a security mandate).

4. EXECUTION PLAN: Outline the sequence in which code blocks will
   be generated to pass the tests.

[STYLE & CONSTRAINTS]
Use low creativity decoding parameters (Temperature ≤ 0.3, Top-P ≤ 0.5)
to ensure logical consistency and accurate syntax.

⚠️ SECURITY MANDATE: NEVER STAGE, COMMIT, or CREATE files named
.env or any files containing hardcoded secrets.

Why this works:

  1. Explicit role → AI understands its exact responsibility
  2. Mandatory reasoning → AI must show its work via <THINKING> tags
  3. Security mandates → Built-in guardrails prevent vulnerabilities
  4. Temperature controls → Deterministic output, not creative guessing
  5. Context injection → All necessary info provided in structured tags

This isn’t a prompt. It’s a software specification for AI behavior.

The 12 Specialized Agents

Each agent has a narrow, well-defined purpose:

1. 🎨 UI/UX Interpreter

2. 📋 Specification Writer

3. 🏗️ Architect

4. 🧪 Tester (TDD)

5. 🧠 Auto-Coder

6. 🪲 Debugger

7. 🛡️ Security Reviewer

8. 🔗 System Integrator

9. 🧹 Optimizer

10. 📚 Documentation Writer

11. 📦 Version Manager

12. 🌳 Git Expert

Plus the orchestrator:

⚡️ SPARC Orchestrator

The Systematic Approach: How It Actually Works

When you give SPARC an objective like “Build a user authentication system,” here’s what happens:

Phase 1: Specification (S)

spec-pseudocode agent activates:

<THINKING>
1. DECOMPOSE & DEFINE:
   - User registration flow
   - Login/logout flow
   - Password reset flow
   - Session management

2. DATA STRUCTURES:
   - User model (id, email, password_hash, created_at)
   - Session model (token, user_id, expires_at)

3. ETHICAL & LOGIC REVIEW:
   - No bias in user filtering
   - PII handling follows GDPR

4. FLOW CONSTRUCTION:
   - Registration: validate → hash → store → send verification
   - Login: validate → verify password → create session
   - Error handling for each step
</THINKING>

Output: algorithm_specification.md

The agent writes formal pseudocode defining every logical step, data structure, and error condition.

Phase 2: Planning & Architecture (P)

architect agent activates:

<THINKING>
1. STEP-BACK ABSTRACTION:
   - Scalability: Support 10k concurrent users
   - Security: Bcrypt hashing, JWT tokens, rate limiting
   - Fault tolerance: Database connection pooling

2. LOGICAL DECOMPOSITION:
   - Auth Service (handles login/registration)
   - Session Service (manages tokens)
   - PostgreSQL database

3. PERFORMANCE ANALYSIS:
   - Password hash check: O(1) with bcrypt
   - User lookup by email: O(1) with indexed query
   - Session validation: O(1) with token cache

4. SECURITY REVIEW:
   - Secrets in environment variables
   - Rate limiting: 5 attempts per IP per minute
   - SQL injection prevention: Parameterized queries
   - No PII in logs

5. DATA MODEL:
   Users table: id, email, password_hash, verified, created_at
   Sessions table: token, user_id, expires_at, created_at
</THINKING>

Output: architecture_blueprint.md + system_diagram.mmd

The architect designs the complete system with performance estimates and security controls.

Phase 3: Auto-Coding & Testing (A) - The TDD Loop

tdd agent writes failing tests:

// src/tests/auth.test.ts

describe("User Registration", () => {
  it("should create user with valid email and password", async () => {
    const result = await registerUser("test@example.com", "SecurePass123!");
    expect(result.success).toBe(true);
    expect(result.user.email).toBe("test@example.com");
  });

  it("should reject weak passwords", async () => {
    const result = await registerUser("test@example.com", "weak");
    expect(result.success).toBe(false);
    expect(result.error).toContain("Password too weak");
  });

  it("should prevent duplicate email registration", async () => {
    await registerUser("test@example.com", "SecurePass123!");
    const result = await registerUser("test@example.com", "SecurePass456!");
    expect(result.success).toBe(false);
    expect(result.error).toContain("Email already exists");
  });
});

Tests run. They fail (Red phase). This is expected.

code agent implements to pass tests:

<THINKING>
1. DECOMPOSE & TEST ANALYSIS:
   Failed assertions:
   - registerUser function doesn't exist
   - Password validation not implemented
   - Duplicate email check missing

2. ADHERENCE CHECK:
   - Use bcrypt (from architecture spec)
   - Password must be 12+ chars, mixed case, symbols
   - PostgreSQL with parameterized queries

3. SECURITY REVIEW:
   - No hardcoded secrets
   - Use env variables for DB connection
   - Bcrypt cost factor 12 (from spec)

4. EXECUTION PLAN:
   - Create User model
   - Implement password validation
   - Implement registerUser function
   - Add duplicate email check
</THINKING>

// Implementation code here...

Tests run again. If they pass (Green phase), move forward. If they fail, debug agent activates.

debug agent fixes failures:

<THINKING>
1. ANALYSIS & RCA:
   Error log shows: "bcrypt is not defined"
   Root cause: Missing import statement
   Location: src/auth/register.ts, line 15

2. CONSTRAINT CHECK:
   Fix doesn't violate architectural constraints
   Still adheres to Big O limits

3. SECURITY REVIEW:
   Fix doesn't introduce new vulnerabilities

4. EXECUTION PLAN:
   Add: import bcrypt from 'bcrypt';
   Re-run tests to confirm
</THINKING>

Output: Bug fix + root_cause_analysis.md

This loop continues until all tests pass.

Phase 4: Refinement & Review (R)

security-review agent audits everything:

<THINKING>
1. SCOPE & THREAT MODELING:
   User input points: email, password
   External calls: Database queries
   Data persistence: PostgreSQL

2. SECRET & PII AUDIT:
   ✓ No hardcoded secrets found
   ✓ No .env file in codebase
   ✓ PII (email) properly handled

3. VULNERABILITY SCAN:
   ✓ SQL injection: Using parameterized queries
   ✓ Password security: Bcrypt with cost 12
   ✓ Rate limiting: Implemented
   ⚠ Missing CSRF protection for session endpoints

4. REMEDIATION PLAN:
   Add CSRF token validation to session creation
</THINKING>

Output: security_audit_report.md with findings

integration agent tests cross-component flows:

<THINKING>
1. ANALYSIS & THREAT IDENTIFICATION:
   Integration points:
   - Frontend → Auth API
   - Auth Service → Database
   - Auth Service → Session Service

2. TEST PLAN GENERATION:
   E2E test: Complete registration + login flow

3. ACTION & OBSERVATION:
   Running integration tests...
   [Test output shows session creation failing]

4. REMEDIATION:
   Session Service not receiving user_id correctly
   Fix: Update data contract in Auth Service
</THINKING>

Output: Integration fixes + integration_report.md

refinement-optimization-mode improves performance:

<THINKING>
1. BOTTLENECK IDENTIFICATION:
   Password verification during login is slow
   Current: Bcrypt verify on every request
   Big O: O(1) but high constant factor

2. REFACTOR PLAN:
   Add session token caching (Redis)
   Only verify password on login, not every request
   Reduces load by 95%

3. SECURITY REVIEW:
   Session cache doesn't expose passwords
   Cache expiry matches session expiry
</THINKING>

Output: Optimized code + optimization_report.md

Phase 5: Completion & Versioning (C)

docs-writer generates documentation:

# User Authentication System

## Overview

JWT-based authentication with bcrypt password hashing, PostgreSQL storage,
and Redis session caching.

## Architecture

[Details from architecture_blueprint.md]

## API Endpoints

POST /auth/register - Create new user account
POST /auth/login - Authenticate and create session
POST /auth/logout - Invalidate session
POST /auth/reset-password - Initiate password reset

## Security Controls

- Bcrypt password hashing (cost factor 12)
- Rate limiting (5 attempts/IP/minute)
- CSRF protection on session endpoints
- SQL injection prevention via parameterized queries

## Performance

- Login: O(1) with indexed database lookup
- Session validation: O(1) with Redis cache
- Handles 10k concurrent users

version-manager calculates version bump:

<THINKING>
1. ANALYZE IMPACT:
   New feature: User authentication system
   Backwards-compatible: Yes (new endpoints, no breaking changes)

2. ASSESS NEED:
   This is a new feature, version bump required

3. DETERMINE LEVEL:
   MINOR bump (new backwards-compatible functionality)

4. CALCULATE NEW VERSION:
   Current: 1.2.3
   New: 1.3.0 (MINOR bump resets PATCH to 0)
</THINKING>

Output: package.json updated to v1.3.0

git-expert commits everything:

<THINKING>
1. DECOMPOSE & MAP:
   - Check current branch
   - Stage modified files
   - Commit with descriptive message
   - Push to remote

2. SECURITY AUDIT:
   File list review:
   ✓ src/auth/register.ts
   ✓ src/auth/login.ts
   ✓ src/tests/auth.test.ts
   ✓ docs/authentication.md
   ✓ package.json
   ✗ No .env files detected

3. EXECUTION PLAN:
   git add [files]
   git commit -m "feat: Add user authentication system"
   git push origin feature/auth-system
</THINKING>

Output: Committed and pushed

What This Actually Achieves

This isn’t about coding faster. It’s about systematic quality control.

Before SPARC:

After SPARC:

Real Results:

Fusion Party Infrastructure:

MagnetLab Client Projects:

This Portfolio:

Lessons From 3,000 Lines of Prompts

1. Temperature Settings Matter More Than You Think

Most developers never adjust temperature settings. This is a mistake.

For creative tasks (naming, UX copy, marketing):

For code generation (implementation, debugging):

For structured data (JSON, version numbers):

In SPARC, every agent has explicit temperature requirements in the prompt. The code agent uses ≤0.3. The version manager uses ≤0.1.

Result: Consistent, predictable output instead of random variations.

2. Reasoning Transparency Prevents Hallucinations

The single most effective technique I discovered: mandatory reasoning tags.

Every SPARC agent must use <THINKING> tags before generating output:

<THINKING>
1. Analyze the requirements
2. Check security constraints
3. Plan the solution
4. Validate against specs
</THINKING>

[Then generate actual output]

Why this works:

Without reasoning tags, AI might generate code that “looks right” but violates requirements. With reasoning tags, you see the flawed logic before it becomes flawed code.

3. Security Can’t Be Bolted On

Security must be in every agent’s prompt, not just a final review step.

spec-pseudocode agent:

architect agent:

code agent:

security-review agent:

git-expert agent:

Result: Security violations caught at every phase, not just at the end.

4. Specificity Beats Generality

Bad prompts are vague. Good prompts are hyper-specific.

Bad prompt:

"Write good code"

Better prompt:

"Implement the authentication system using the pseudocode specification"

SPARC prompt:

"Generate code that passes the failing tests in <SPECIFICATION_INPUT>,
adheres to the Big O constraints (O(1) for user lookup) from the
architect agent, uses design tokens from ui-ux-interpreter, includes
comprehensive type hints and docstrings, uses bcrypt with cost factor 12,
implements rate limiting at 5 attempts per IP per minute, and stores
secrets in environment variables. Use Temperature ≤ 0.3 for deterministic
output."

The SPARC prompt provides:

No ambiguity. No guessing. Just requirements.

5. Agent Specialization Works

One generalist AI agent trying to do everything = mediocre at everything.

Twelve specialized agents, each expert in one thing = excellent at their specific task.

Why specialization works:

Narrow scope = Better prompts:

Clear ownership:

Validation at boundaries:

Progressive refinement:

6. Context Injection Is Critical

AI needs context. But not just any context. Structured, tagged context.

SPARC uses <SPECIFICATION_INPUT> tags to inject context:

<SPECIFICATION_INPUT>
<PSEUDOCODE>
[Algorithm specification from spec agent]
</PSEUDOCODE>

<ARCHITECTURE>
[System design from architect agent]
</ARCHITECTURE>

<DESIGN_TOKENS>
[UI/UX specifications from ui-ux-interpreter]
</DESIGN_TOKENS>

<FAILING_TESTS>
[Test output from tdd agent]
</FAILING_TESTS>
</SPECIFICATION_INPUT>

Why nested tags matter:

The code agent receives pseudocode, architecture, design tokens, and failing tests. It has everything needed, nothing extra.

7. Validation Loops Catch Errors Early

The ReAct pattern (Reason → Act → Observe) is built into SPARC.

After every agent completes its task:

  1. Log the output (Memory Bank records what was done)
  2. Run validation (tests, linting, build checks)
  3. Observe results:
  4. Repeat until green

This creates a feedback loop:

No “write code, deploy, hope it works.” Instead: “write code, test, fix, validate, then commit.”

8. Documentation Can’t Be An Afterthought

In SPARC, documentation is generated automatically as the final phase.

The docs-writer agent has access to:

It synthesizes everything into:

Why this works:

9. Versioning Requires Logic, Not Guessing

Most developers bump versions arbitrarily. SPARC calculates them deterministically.

The version-manager agent analyzes the feature:

<THINKING>
1. What changed?
   - New authentication endpoints (new feature)

2. Is it backwards-compatible?
   - Yes (existing endpoints unchanged)

3. What's the bump level?
   - MINOR (new backwards-compatible functionality)

4. Calculate new version:
   - Current: 1.2.3
   - Bump MINOR: 1.3.x
   - Reset PATCH: 1.3.0
</THINKING>

Result: Semantic versioning that actually means something.

No human judgment required. Just systematic analysis.

10. The Human Still Matters

SPARC doesn’t replace human judgment. It augments it.

Where AI excels:

Where humans are essential:

SPARC automates the mechanical parts so humans can focus on the strategic parts.

The Honest Limitations

This isn’t a magic solution. It has real constraints.

What SPARC Doesn’t Solve:

1. AI still hallucinates occasionally

2. Complex debugging needs human judgment

3. Architectural decisions need business context

4. Some domains need human expertise

What SPARC Dramatically Reduces:

1. Inconsistent output quality

2. Security vulnerabilities

3. Missing tests

4. Poor documentation

5. Version management chaos

How You Can Use This

The SPARC prompt library is open source.

GitHub: [github.com/finneh4249/sparc-prompts] (update with actual link)

To use it with Roo Code:

  1. Install Roo Code
  2. Load the custom modes configuration
  3. Give SPARC an objective
  4. Watch it orchestrate through the workflow

To adapt it for other tools:

Individual agent prompts work in:

Modify for your needs:

The methodology is transferable:

Example: Using the ‘code’ agent prompt in Claude

  1. Create a new Project in Claude
  2. Add this custom instruction:
You are a Senior Production Engineer specializing in Test-Driven
Development. Before generating any code, you MUST use <THINKING>
tags to:
1. Analyze the failing test assertions
2. Verify adherence to architectural constraints
3. Check for security issues (never hardcode secrets)
4. Plan your implementation

Use Temperature ≤ 0.3 for deterministic code generation.
Output only the necessary code to pass the failing tests.
  1. Provide your failing tests and specifications
  2. Get deterministic, security-conscious implementation

The Bottom Line

I didn’t build a framework. I built a systematic approach to prompt engineering that encodes production engineering practices into AI behavior.

3,000+ lines of prompts that implement:

This is prompt engineering at scale.

Not one-off ChatGPT requests. Not random experiments. A formal specification for how AI should collaborate on production systems.

The code isn’t magic. The architecture isn’t revolutionary. The testing isn’t novel.

What’s novel is encoding all of it into prompts.

Most developers use AI as a faster autocomplete. I use AI as a systematic development partner with explicit workflows, security mandates, and quality controls.

That’s the difference between AI-assisted coding and AI-orchestrated development.

And it works. Fusion Party runs on it. MagnetLab clients paid for it. This portfolio proves it.

The prompts are open source. The methodology is free. The results speak for themselves.


About the Author

I’m Ethan Cornwill. I’ve spent 13+ years writing software and 10 years managing QSR operations where downtime costs money and failures are public. I train LLMs at DataAnnotation, founded MagnetLab (AI consultancy), and serve as National Secretary for Fusion Party where I built their technical infrastructure.

I developed SPARC because I got tired of AI-generated code that looked good in demos and died in production.

Want to see it in action? Check out this portfolio. It was built using SPARC.

Contact: mail@finneh.xyz

GitHub: github.com/finneh4249
LinkedIn: linkedin.com/in/ethancornwill
Portfolio: finneh.xyz


This post was written using AI-human collaboration. I outlined the structure and key points, Claude helped with phrasing and examples, and I validated every technical claim against my actual experience. The reasoning was mine. The words were collaborative. The honesty is non-negotiable.

That’s SPARC in action.