Building Ralph for Claude Code: An Autonomous AI Coding Loop Done Right

Table of Contents

Discovering Ralph
#

Over the holidays, I kept seeing excitement about “Ralph” in AI coding circles - an autonomous coding loop technique that lets Claude Code work through tasks iteratively without constant human intervention. I started experimenting with it in early January, refining my approach until it worked reliably.

The concept comes from Geoffrey Huntley’s original technique, which is beautifully simple:

while :; do cat PROMPT.md | claude-code ; done

That’s it. A shell loop that feeds a prompt to Claude Code, lets it work, and when Claude exits, feeds the same prompt again. Each iteration is a fresh process with fresh context.

This simplicity masks a profound insight: by starting fresh each iteration, Claude must re-discover its work through files, not conversation memory. There’s no context rot from accumulated tool calls and intermediate reasoning. Each iteration reads the current state of the codebase and decides what to do next.

Why does this matter? Long-running Claude Code sessions accumulate context - previous file reads, tool call history, intermediate reasoning, stale information from earlier in the session. After 10-20 iterations, this context bloat causes slower responses, reduced reasoning quality, and eventually hits context limits. Fresh context each iteration solves this elegantly.

The Problem with Existing Implementations
#

After discovering Ralph, I naturally searched for existing implementations. I tried several community versions that were shared and recommended. Then I found the official Anthropic ralph-loop plugin and thought: “Perfect, the official implementation should be the gold standard.”

It wasn’t.

The Anthropic plugin uses Stop hooks to intercept session exit and feed the prompt back, keeping everything within a single session. This sounds efficient, but it fundamentally deviates from the original Ralph philosophy. Instead of fresh context each iteration, context accumulates.

Issue #16440 on the Claude Code repository documents this problem:

The original Ralph technique uses an external Bash loop… Each iteration is a new process with fresh context. The ralph-wiggum plugin attempts to replicate this using Stop hooks, but currently the context accumulates instead of resetting.

Here’s how the approaches compare:

Aspect	Original Ralph	Anthropic Plugin	My Implementation
Context per iteration	Fresh (new process)	Accumulated (same session)	Fresh (new process)
Context size over time	Constant (~40k)	Growing (40k → 200k+)	Constant (~40k)
Iteration limit	Unlimited	~20 before overflow	Unlimited (with -u flag)
Learning persistence	Via files	Via conversation memory	Via `WORKLOG.md`
Behavior consistency	Consistent	Degrades over time	Consistent

The issues are real:

Context overflow: After 10-20 iterations, context exceeds limits
Degraded reasoning: Attention gets diluted across irrelevant history
Different behavior: The accumulated approach produces fundamentally different results

The Solution: ralph-claude-code
#

I built ralph-claude-code to solve these problems while adding practical improvements for real-world use.

The core philosophy: fresh context each iteration, but learnings persist through files.

This is accomplished through a two-file system:

BRIEF.md: Your static task specification (never changes during execution except to mark completion)
WORKLOG.md: Dynamic learnings accumulated across iterations

┌─────────────────────────────────────────────────────────────┐
│                     Ralph Iteration Loop                    │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  Read BRIEF.md   │
                    │  Find first [ ]  │
                    └─────────┬────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │ Read WORKLOG.md  │
                    │ Check learnings  │
                    └─────────┬────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │   Execute task   │
                    │  (one at a time) │
                    └─────────┬────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │     Validate     │
                    │  tests/lint/type │
                    └─────────┬────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
     ┌────────────────┐              ┌────────────────┐
     │   PASS         │              │   FAIL         │
     │ Mark [x]       │              │ Leave [ ]      │
     │ Commit changes │              │ Log learnings  │
     │ Log success    │              │ Next iteration │
     └────────┬───────┘              └────────┬───────┘
              │                               │
              └───────────────┬───────────────┘
                              │
                              ▼
                    ┌──────────────────┐
                    │  All tasks [x]?  │
                    └─────────┬────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
     ┌────────────────┐              ┌────────────────┐
     │      YES       │              │       NO       │
     │    COMPLETE    │              │ Next iteration │
     └────────────────┘              └────────────────┘

Each iteration spawns a new claude process, reads the current state from files, and works with fresh context. Failed attempts are logged to WORKLOG.md, so the next iteration can learn from mistakes without carrying the baggage of accumulated conversation history.

Key Features
#

Fresh Context Per Iteration
#

The core loop spawns a new process each time:

# Simplified from the actual implementation
while true; do
    timeout "$TIMEOUT" claude --print "$PROMPT" | tee "$OUTPUT_FILE"

    if grep -q "<promise>COMPLETE</promise>" "$OUTPUT_FILE"; then
        break  # All tasks done
    fi
done

Each claude invocation starts fresh. No accumulated context, no degradation over time.

Learning Loop Architecture
#

The WORKLOG.md file accumulates learnings without context bloat:

# Work Log

## Learnings
- Tests use BATS framework: run with `bats test/`
- Config is loaded from ~/.myapprc
- The API uses snake_case for all parameters

---

## Iteration 1 - TASK-001: Create config module
- What was implemented: Config loading with defaults
- Files changed: src/config.js, test/config.test.js
- Learnings for future iterations:
  - Use dotenv for env vars
  - Config validates on load
---

## Iteration 2 - TASK-001: Create config module (retry)
- Previous attempt failed: Missing required ENV var
- What was fixed: Added fallback defaults
- Files changed: src/config.js
---

The key insight: Claude reads this file at the start of each iteration, so it has access to all learnings from previous iterations - but only the distilled learnings, not the full conversation history.

The /brief Skill
#

Writing good BRIEF.md files is surprisingly tricky and time consuming. Tasks need to be:

Small enough to fit in one iteration’s context window
Sequenced correctly (data layer before UI)
Verifiable with objective acceptance criteria

The /brief skill automates this through Claude’s AskUserQuestion tool:

claude
# Then type: /brief I want my application to do xyz

The skill guides you through interactive requirements gathering:

Existing file detection: Checks if BRIEF.md exists and asks what to do
Task scoping: Ensures each task fits within a single iteration - oversized tasks are decomposed
Dependency ordering: Sequences tasks correctly (data layer → logic → UI)
Verifiable criteria: Every acceptance criterion is objectively checkable
Required validations: Automatically adds “Testing passes” and “Linting passes” to each task

Flexible Controls
#

# Run with defaults (10 iterations, 3s sleep between each)
ralph

# Run up to 20 iterations
ralph -n 20

# Run unlimited iterations until all tasks complete
ralph -u

# Run with no pause between iterations (fastest)
ralph -S

# Reset worklog and start fresh
ralph -r

# Keep existing worklog and continue
ralph -k

# Preview what would happen without running
ralph -d

# Combine flags: unlimited iterations, no sleep, verbose
ralph -u -S -v

Edge Cases and Solutions
#

This is where the real work went - handling all the things that go wrong in practice:

File Cleanup and Creation
#

Ralph handles BRIEF.md and WORKLOG.md intelligently:

Checks if BRIEF.md exists before starting (errors if missing)
Prompts about existing WORKLOG.md (reset or continue?)
The --reset and --keep flags skip the prompt for scripted use
The --cleanup flag removes both files after completion

Gitignore Management
#

You probably don’t want to commit BRIEF.md and WORKLOG.md to your repo. Ralph auto-detects missing gitignore entries:

⚠️  Workflow files not in .gitignore
    BRIEF.md and WORKLOG.md should probably be gitignored.

    Add them now? (y/n)

The --add-gitignore flag auto-adds them, and --skip-gitignore skips the check entirely.

Stuck Iteration Timeouts
#

Sometimes Claude gets stuck. Each iteration has a configurable timeout (default 10 minutes):

ralph --timeout 3600  # 1 hour timeout per iteration

Timeouts are logged to WORKLOG.md so the next iteration knows something went wrong and can try a different approach.

Debugging with Prompt Output
#

When things consistently fail, you need to see exactly what Claude is doing:

ralph -P  # or --prompt

This outputs the exact prompt being sent to Claude, which you can paste into an interactive session to debug why iterations are failing.

Signal Handling
#

Clean Ctrl+C interruption with proper exit codes:

Exit 0: All tasks completed successfully
Exit 1: Error or max iterations reached
Exit 130: Interrupted by user (Ctrl+C)

The BRIEF.md Format
#

Here’s what a well-structured brief looks like:

# Brief: Comment Threading System

## Introduction

Enable nested replies on comments so users can have focused discussions.
Comments can have replies up to 3 levels deep, with collapse/expand controls
and visual indentation.

## Objectives

- Support threaded replies on any comment
- Limit nesting to 3 levels to maintain readability
- Allow collapsing/expanding reply threads
- Show reply count on collapsed threads

## Tasks

### TASK-001: Add parent reference to comments table
**Description:** As a developer, I need to track comment relationships
so replies link to their parent.

**Acceptance Criteria:**
- [ ] Add nullable `parent_id` foreign key column
- [ ] Add index on `parent_id` for query performance
- [ ] Migration runs without errors
- [ ] Testing passes
- [ ] Linting passes

### TASK-002: Create reply submission endpoint
**Description:** As a user, I need to submit a reply to an existing comment.

**Acceptance Criteria:**
- [ ] POST endpoint accepts `parent_id` and `content`
- [ ] Validates parent exists and nesting depth <= 3
- [ ] Returns 422 if max depth exceeded
- [ ] Testing passes
- [ ] Linting passes

### TASK-003: Render nested comment tree
**Description:** As a user, I want to see replies indented beneath
their parent comment.

**Acceptance Criteria:**
- [ ] Replies render with increasing left margin per level
- [ ] Maximum 3 indentation levels displayed
- [ ] Reply count badge shows on comments with replies
- [ ] Testing passes
- [ ] Linting passes
- [ ] Verify changes work in browser

## Out of Scope

- No @mentions or notifications for replies
- No editing or deleting replies after posting
- No pagination within threads

## Implementation Notes

- Leverage existing Comment component, add depth prop
- Use recursive rendering for nested structure

Key principles:

Each task is small enough for one iteration
Tasks are sequenced by dependencies (schema → API → UI)
Acceptance criteria are objectively verifiable
Every task includes “Testing passes” and “Linting passes”

Installation
#

Prerequisites
#

Bash 4.0+: Ralph uses modern bash features
- macOS: Install via Homebrew (brew install bash) - the system bash is v3.x
- Linux: Usually already installed
Claude Code CLI: The claude command must be in your PATH
- Install from: https://docs.anthropic.com/en/docs/claude-code
timeout command: Used for iteration timeouts
- macOS: Install via Homebrew (brew install coreutils) - provides gtimeout
- Linux: Usually already installed as timeout

Installation Steps
#

# Clone the repository
git clone https://github.com/mmenanno/ralph-claude-code.git
cd ralph-claude-code

# Make ralph executable
chmod +x ralph

# Symlink to PATH
ln -s "$(pwd)/ralph" /usr/local/bin/ralph

# Install the brief skill (optional but recommended)
mkdir -p ~/.claude/skills
ln -s "$(pwd)/skill/brief" ~/.claude/skills/brief

# Verify installation
ralph --help

Real-World Usage
#

My typical workflow:

# Navigate to project
cd my-project

# Create a brief using the skill
claude
# > /brief My application needs feature xyz added
# > to it with these requirements...

# Run ralph
ralph -u  # unlimited iterations until complete

# When done, review the results then clean up
ralph --cleanup

When to Use Ralph vs. Normal Sessions
#

Use Ralph for:

Multi-step features with clear acceptance criteria
Tasks that benefit from fresh context between steps
Long-running work that would hit context limits
Projects with good test coverage (Ralph needs validation)
Prototyping where you have an idea and direction and just want to see how far claude can get on it’s own

Use normal Claude Code for:

Exploratory work where you’re still figuring things out
Quick fixes and single-task work
Interactive debugging and pairing

Lessons Learned
#

Understanding Original Intent Matters
#

The Anthropic plugin tried to be clever by using Stop hooks to keep everything in one session. This breaks the core insight of Ralph: fresh context prevents degradation. Sometimes the simple approach (spawn new processes) is better than the clever approach (hooks and session management).

Fresh Context Is a Feature, Not a Bug
#

It’s tempting to think accumulated context would help - Claude would “remember” what it already tried. In practice, the opposite is true. Fresh context with file-based learnings produces better results than accumulated context with conversation memory.

Try It Yourself
#

Ralph is MIT licensed and available on GitHub:

github.com/mmenanno/ralph-claude-code

Contributions welcome! The repository includes:

Complete implementation with 130+ tests
The /brief skill for creating well-structured briefs
Comprehensive documentation
CONTRIBUTING.md with development guidelines

If you’ve been frustrated by context limits in long Claude Code sessions, or want to automate multi-step development workflows, give Ralph a try. The fresh-context-per-iteration approach makes a real difference.

Resources:

Discovering Ralph#

The Problem with Existing Implementations#

The Solution: ralph-claude-code#

Key Features#

Fresh Context Per Iteration#

Learning Loop Architecture#

The /brief Skill#

Flexible Controls#

Edge Cases and Solutions#

File Cleanup and Creation#

Gitignore Management#

Stuck Iteration Timeouts#

Debugging with Prompt Output#

Signal Handling#

The BRIEF.md Format#

Installation#

Prerequisites#

Installation Steps#

Real-World Usage#

When to Use Ralph vs. Normal Sessions#

Lessons Learned#

Understanding Original Intent Matters#

Fresh Context Is a Feature, Not a Bug#

Try It Yourself#