<< All versions
Skill v1.0.0
Trusted Publisher100/100microsoft/skills-for-copilot-studio/analyze-evals
──Details
PublishedMay 25, 2026 at 04:40 PM
Content Hashsha256:0a17b8949a7580d0...
Git SHA
──Files
Files (1 file, 3.1 KB)
SKILL.md3.1 KBactive
SKILL.md · 64 lines · 3.1 KB
version: "1.0.0" user-invocable: false description: > Analyze exported evaluation results from Copilot Studio's Evaluate tab. The user provides a CSV file exported from the Copilot Studio UI; this skill parses it, identifies failures, and proposes YAML fixes. No API access or published agent required — just the exported CSV. allowed-tools: Read, Glob, Grep, Edit context: fork agent: copilot-studio-test
Analyze Copilot Studio Evaluation Results
Analyze evaluation results exported from the Copilot Studio UI as CSV.
Phase 1: Get Results
- Ask the user for the CSV file path if not already provided. The file is typically exported from Copilot Studio's Evaluate tab and named
Evaluate <agent name> <date>.csvin their Downloads folder.
- Read the CSV file. The in-product evaluation CSV has these columns:
| Column | Meaning | |
|---|---|---|
question | The test utterance | |
expectedResponse | Expected response (may be empty) | |
actualResponse | What the agent responded | |
testMethodType_1 | Eval method (e.g., GeneralQuality) | |
result_1 | Pass or Fail | |
passingScore_1 | Score threshold (may be empty) | |
explanation_1 | Why it passed/failed (e.g., "Seems relevant; Seems incomplete; Knowledge sources not cited") |
The _1 suffix indicates the first eval method. There may be additional methods (_2, _3, etc.) with the same column pattern.
Phase 2: Analyze Results
- Focus on failed evaluations (
result_1=Fail, or anyresult_N=Fail).
- For each failure, use the
explanationcolumn to understand the issue:
- "Question not answered" — The agent couldn't handle the question. Check if there's a matching topic or knowledge source.
- "Knowledge sources not cited" — The agent responded but didn't cite sources. Check knowledge source configuration and
SearchAndSummarizeContentnodes. - "Seems incomplete" — The response was partial. Check topic flow for early exits, missing branches, or incomplete
SendActivitymessages. - Error messages in
actualResponse(e.g.,GenAIToolPlannerRateLimitReached) — These are runtime errors, not authoring issues. Flag them to the user as transient failures to retry.
Phase 3: Propose Fixes
- For each failure, identify the relevant YAML file(s):
- Auto-discover the agent:
Glob: **/agent.mcs.yml - Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
- Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
- Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
- Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running evaluations.