Skill Studio

Write what good looks like. Benchmark against any LLM. Track quality over time. Runs 100% locally.

Features

Plain-English Evals

Describe expected behavior in natural language. An AI judge evaluates outputs semantically — no test code required.

A/B Compare

Run the same skill against multiple prompts or configurations side by side. Spot regressions instantly.

Any Model

Test against Claude, GPT, Gemini, Llama, or any model you configure. No vendor lock-in.

100% Local

Everything runs on your machine. No data leaves your environment. No cloud accounts required.

Get Started

Run in your project directory

First time? Run npx vskill init first to scaffold your test suite.

Skill Studio — Test, benchmark, and optimize AI skills locally | vskill