Skill v1.0.2

Automated scan100/100

lllllllama/ai-paper-reproduction-skill/minimal-run-and-audit

5 files

──Details

PublishedMay 18, 2026 at 03:45 AM

Content Hashsha256:f43a96de9a3e7d4c...

Git SHAd02df96ed13b

Bump Typepatch

Compare with v1.0.1

──Files

Files (1 file, 2.6 KB)

SKILL.md2.6 KBactive

SKILL.md · 57 lines · 2.6 KB

version: "1.0.2" name: minimal-run-and-audit description: RigorPilot trusted execution and reporting skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized repro_outputs/ files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end-to-end orchestration by itself.

minimal-run-and-audit

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should make run evidence auditable without turning every command into a rigid protocol.

When to apply

After a reproduction target and setup plan exist.
When the main skill needs execution evidence and normalized outputs.
When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
When the user already knows what command should be attempted and wants execution plus reporting only.

When not to apply

During initial repo scanning.
When environment or assets are still undefined enough to make execution meaningless.
When the task is a literature lookup rather than repository execution.
When the user is still deciding which reproduction target should count as the main run.

Clear boundaries

This skill owns normalized reporting for an attempted command.
It may receive execution evidence from the main skill or a thin helper.
It does not choose the overall target on its own.
It does not perform broad paper analysis.
It does not own training startup, resume, or long-running training state.
It should not normalize risky code edits into acceptable practice.
It must not hide changes that alter evaluation, preprocessing, checkpoints,

metrics, or other scientific meaning.

Input expectations

selected reproduction goal
runnable commands or smoke commands
environment and asset assumptions
optional patch metadata

Output expectations

execution result summary
standardized repro_outputs/ files
SCIENTIFIC_CHANGELOG.md for changed scientific meaning and evidence status
COMPARABILITY_REPORT.md for README/paper/baseline comparability
clear distinction between verified, partial, and blocked states
PATCHES.md when repo files changed

Notes

Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.

← v1.0.1 All versions v1.0.3 →