Skill Studio
Write what good looks like. Benchmark against any LLM. Track quality over time. Runs 100% locally.
Features
Plain-English Evals
Describe expected behavior in natural language. An AI judge evaluates outputs semantically — no test code required.
A/B Compare
Run the same skill against multiple prompts or configurations side by side. Spot regressions instantly.
Any Model
Test against Claude, GPT, Gemini, Llama, or any model you configure. No vendor lock-in.
100% Local
Everything runs on your machine. No data leaves your environment. No cloud accounts required.
Get Started
Run in your project directory
First time? Run npx vskill init first to scaffold your test suite.