Skill v1.0.1
currentAutomated scan100/100+1 new
version: "1.0.1" name: autoresearch description: Autonomous improvement loop — scan codebase metrics, scaffold experiment files, run agent-driven iterations until metric improves argument-hint: "[--scaffold <loop-name>] [--run <loop-name>] [--status]" effort: high disable-model-invocation: true
Autoresearch — Autonomous Improvement Loop
Scan codebase quality metrics, propose improvement loops, and run autonomous agent iterations. Inspired by karpathy/autoresearch — adapted from ML research to code quality.
Concept: The agent proposes a code change, runs the measurement, keeps the change if the metric improved, reverts via git reset if not, and repeats until manually stopped.
Time: Scan ~30s | Per iteration: depends on scope | Loop: runs indefinitely until you stop it
Mode 1: Scan (default)
Measure current state, detect existing loops, propose next actions.
Instructions
Run the following metrics and display a prioritized proposal table.
Step 1: Measure codebase metrics
Adapt grep patterns to your project's conventions. These are TypeScript defaults — adjust for your stack.
# M1: Function declarations (prefer arrow functions)M1=$(grep -r "export function " src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')# M2: Interface declarations (prefer type aliases)M2=$(grep -r "export interface " src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')# M3: ESLint disablesM3=$(grep -r "eslint-disable" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')# M4: Type casts to anyM4=$(grep -r " as any" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')# M5: TODO commentsM5=$(grep -r "// TODO" src/ --include="*.ts" --include="*.tsx" -l 2>/dev/null | wc -l | tr -d ' ')
Step 2: Detect existing loops
for dir in scripts/autoresearch/loop-*/; do[ -d "$dir" ] || continueLOOP_NAME=$(basename "$dir")# Check if loop has resultsif [[ -f "$dir/results.tsv" ]]; thenITERS=$(wc -l < "$dir/results.tsv" | tr -d ' ')BEST=$(sort -t$'\t' -k2 -n "$dir/results.tsv" | head -1 | cut -f2)echo "ACTIVE:$LOOP_NAME:iterations=$ITERS:best=$BEST"elseecho "SCAFFOLDED:$LOOP_NAME"fidone
Step 3: Display
Autoresearch Scan — {date}Codebase metrics:| # | Loop | Metric | Current | Target | Priority | Risk ||---|-------------------|-------------------|---------|--------|----------|------|| A | loop-remove-as-any| `as any` casts | {M4} | 0 | P1 | LOW || B | loop-eslint-disable| eslint-disable | {M3} | 0 | P2 | MED || C | loop-export-fn | export function | {M1} | 0 | P1 | LOW || D | loop-interface-type| export interface | {M2} | 0 | P1 | LOW || E | loop-todo-comments| TODO comments | {M5} | 0 | P3 | LOW |Existing loops: {detected loops or "none yet"}Recommended next step (P1, LOW risk):/autoresearch --scaffold loop-remove-as-anyThen write program.md, create a worktree, and run the loop.
Mode 2: --scaffold <loop-name>
Generate the 3 mechanical files for a loop. Does not generate `program.md` — write that yourself to encode project-specific constraints.
Instructions
Create the following files under scripts/autoresearch/{loop-name}/:
`measure.sh` — the evaluation harness (single metric, returns an integer):
#!/usr/bin/env bash# measure.sh — {loop-name}# Returns an integer. Direction: lower = better (unless loop targets coverage/score).set -euo pipefailgrep -r "PATTERN" src/ --include="*.ts" --include="*.tsx" 2>/dev/null | wc -l | tr -d ' '
`direction.txt` — improvement direction:
lower
(Use higher for metrics like test coverage or quality score.)
`files.txt` — scope the agent should operate on:
src/
After creating the files, display:
Loop scaffolded: scripts/autoresearch/{loop-name}/measure.sh : {pattern} in {scope} → {N} occurrences todaydirection : lower (fewer = better)files.txt : src/Current metric: {N} (target: 0)Next steps:1. Write program.md — agent behavior, constraints, what it can/cannot touchReference: scripts/autoresearch/loop-remove-as-any/program.md2. Create a worktree: /worktree feature/autoresearch-{loop-name}3. cd into the worktree4. bash scripts/autoresearch/runner.sh {loop-name} 0 15
Mode 3: --run <loop-name>
Execute the autonomous loop. The agent runs indefinitely — stop it manually when satisfied.
Instructions
Verify prerequisites:
[ -f "scripts/autoresearch/{loop-name}/measure.sh" ] || { echo "ERROR: measure.sh missing. Run --scaffold first."; exit 1; }[ -f "scripts/autoresearch/{loop-name}/program.md" ] || { echo "ERROR: program.md missing. Write it first — this encodes your constraints."; exit 1; }
Run the loop:
Read scripts/autoresearch/{loop-name}/program.md fully before starting. Then enter the following cycle — repeat until stopped:
LOOP ITERATION #{N}1. Current metric: bash scripts/autoresearch/{loop-name}/measure.sh2. Read program.md constraints3. Propose ONE targeted change to files in files.txt4. Apply the change5. Re-measure: bash scripts/autoresearch/{loop-name}/measure.sh6. Evaluate:- direction=lower AND new < previous → KEEP (git add -p && git commit -m "autoresearch: {description}")- otherwise → REVERT (git checkout -- .)7. Log to results.tsv: {timestamp}\t{metric}\t{status}\t{description}8. Continue to iteration #{N+1}
Stopping criteria (from program.md):
- Metric reaches target (e.g., 0)
- No more mechanical changes possible
- User manually stops the process
Display each iteration:
[iter #{N}] metric: {before} → {after} | {KEPT/REVERTED} | {change description}
Mode 4: --status
Show status of all loops in the project.
Instructions
for dir in scripts/autoresearch/loop-*/; do[ -d "$dir" ] || continueNAME=$(basename "$dir")CURRENT=$(bash "$dir/measure.sh" 2>/dev/null || echo "?")ITERS=$([ -f "$dir/results.tsv" ] && wc -l < "$dir/results.tsv" | tr -d ' ' || echo "0")KEPT=$([ -f "$dir/results.tsv" ] && grep -c "KEPT" "$dir/results.tsv" || echo "0")echo "$NAME | current: $CURRENT | iters: $ITERS | kept: $KEPT"done
Display:
Autoresearch Status| Loop | Current | Iterations | Kept | Status ||---------------------|---------|------------|------|-----------|| loop-remove-as-any | {N} | {N} | {N} | ACTIVE || loop-export-fn | {N} | 0 | 0 | SCAFFOLDED|
Writing program.md — The Most Important File
program.md is the agent's behavior contract. Write it yourself — never auto-generate it. It must encode what the agent can/cannot touch for your specific codebase.
Minimal structure:
# Program: {loop-name}## ObjectiveReduce `{metric}` in `src/` to 0. One mechanical change per iteration.## Measurementbash scripts/autoresearch/{loop-name}/measure.shLower = better. Target: 0.## What you CAN do-Replace `export function X(` with `export const X = (`-Keep the function signature identical## What you CANNOT do-Modify test files-Change function signatures-Touch files outside src/-Make multiple changes per iteration## Stop when-Metric = 0-No more mechanical replacements exist
The Pattern (Background)
This command implements the autoresearch loop pattern from karpathy/autoresearch:
| ML Research (karpathy) | Code Quality (this command) | |
|---|---|---|
Modify train.py | Modify src/ files | |
Measure val_bpb | Measure grep count | |
| 5-minute GPU budget | One atomic change per iteration | |
| Keep if val_bpb improves | Keep if count decreases | |
git reset if not | git checkout -- . if not | |
program.md = agent skill | program.md = agent skill |
Key insight: a fixed, objective metric + git as rollback mechanism = safe autonomous iteration. The agent never needs human approval per-change because every bad change is automatically reverted.
Usage
Scan and propose loops:
/autoresearch
Scaffold files for a specific loop:
/autoresearch --scaffold loop-remove-as-any
Run the autonomous loop (after writing program.md):
/autoresearch --run loop-remove-as-any
Check status of all loops:
/autoresearch --status
$ARGUMENTS