CLI Commands Reference

Essential Commands

Main Workflow:

stackbench setup - Set up repository for IDE execution (clone + extract)
stackbench print-prompt - Get use case prompts for manual execution
stackbench analyze - Analyze implementations and generate reports

Management:

stackbench list - List all benchmark runs
stackbench status - Check run status and progress
stackbench clean - Clean up old runs

Main Commands

`stackbench setup`

Set up repository for IDE execution (clone + extract use cases).

stackbench setup <repository-url> -a cursor -i docs,examples -l python

Common options:

-i docs,examples - Focus on specific folders
-b develop - Use specific branch
-l javascript - Specify programming language (python/py, javascript/js, typescript/ts)
--num-use-cases 10 - Generate more use cases

Returns: Run ID for use with other commands

`stackbench print-prompt`

Get formatted prompt for manual IDE execution.

# Get use case 1 prompt and copy to clipboard
stackbench print-prompt <run-id> -u 1 --copy

# Just print without copying
stackbench print-prompt <run-id> -u 1

`stackbench analyze`

Analyze implementations and generate reports.

# Analyze all use cases
stackbench analyze <run-id>

# Analyze specific use case
stackbench analyze <run-id> --use-case 2

# Force re-analysis
stackbench analyze <run-id> --force

Prerequisites:

Claude Code CLI: npm install -g @anthropic-ai/claude-code
Set ANTHROPIC_API_KEY in environment
Use case solution files must exist

Management Commands

`stackbench list`

List all benchmark runs.

# List all runs
stackbench list

# Filter by phase or agent
stackbench list --phase completed
stackbench list --agent cursor

`stackbench status`

Check run status and progress.

stackbench status <run-id>

Shows current phase, use case execution status, and next steps.

`stackbench clean`

Clean up old benchmark runs.

# Remove runs older than 30 days (default)
stackbench clean

# Remove runs older than 7 days
stackbench clean --older-than 7

# Preview what would be deleted
stackbench clean --dry-run

Configuration

Required environment variables:

OPENAI_API_KEY=your_openai_key_here     # For use case extraction
ANTHROPIC_API_KEY=your_anthropic_key    # For analysis

Common settings (optional):

NUM_USE_CASES=10                        # Default: 5
ANALYSIS_MAX_WORKERS=5                  # Default: 3
DSPY_MODEL=gpt-4o                       # Default: gpt-4o-mini

Quick Start Workflow

# 1. Set up repository
stackbench setup https://github.com/user/awesome-lib -i docs

# 2. Execute use cases in Cursor IDE
stackbench print-prompt <run-id> -u 1 --copy
# ⚠️ Wait for Cursor indexing to complete first!
# Paste in Cursor, let it implement, repeat for all use cases

# 3. Analyze results
stackbench analyze <run-id>

# 4. View results
cat ./data/<run-id>/results.md

Next Steps

Getting Started - Complete tutorial walkthrough
How StackBench Works - Understanding the internals