Skill v1.0.0
Trusted Publisher100/100version: "1.0.0" user-invocable: false description: > Run a batch test suite via the Copilot Studio Kit (Dataverse API). Uses the Power CAT Copilot Studio Kit to execute test cases against a published agent and produces pass/fail results with latencies. Requires the Kit installed in the environment, an App Registration with Dataverse permissions, and a published agent. allowed-tools: Bash(node run-tests.js ), Bash(npm install *), Read, Write, Glob, Grep, Edit context: fork agent: copilot-studio-test
Run Tests via Copilot Studio Kit
Run a batch test suite against a published Copilot Studio agent using the Power CAT Copilot Studio Kit.
Prerequisites
The user must have:
- The [Copilot Studio Kit](https://github.com/microsoft/Power-CAT-Copilot-Studio-Kit) installed in their Power Platform environment
- Published their agent in the Copilot Studio UI
- Created a test set in the Copilot Studio Kit
- An Azure App Registration with Dataverse permissions
Phase 1: Configure Settings
- Read `tests/settings.json` (relative to the user's project CWD) and check for missing or placeholder values (containing
YOUR_).
- If the file doesn't exist, create it from the template:
``bash cp ${CLAUDE_SKILL_DIR}/../../tests/settings-example.json ./tests/settings.json ``
- If values are missing, ask the user for each missing value. Explain where to find each one:
- Environment URL (
dataverse.environmentUrl): "What is your Dataverse environment URL? Find it in Power Platform admin center or Copilot Studio > Settings > Session Details. It looks likehttps://orgXXXXXX.crm.dynamics.com" - Tenant ID (
dataverse.tenantId): "What is your Azure tenant ID? Find it in Azure Portal > Microsoft Entra ID > Overview. It's a GUID likec87f36f7-fc65-453c-9019-0d724f21bc42" - Client ID (
dataverse.clientId): "What is your App Registration client ID? Find it in Azure Portal > App Registrations > your app > Application (client) ID. It's a GUID." - Agent Configuration ID (
testRun.agentConfigurationId): "What is your agent configuration ID? In Copilot Studio, go to your agent > Tests tab. The ID is a GUID found in the URL or test configuration." - Test Set ID (
testRun.agentTestSetId): "What is your test set ID? In Copilot Studio, go to your agent > Tests tab > select your test set. The ID is a GUID found in the URL."
Ask for ALL missing values at once (don't ask one at a time).
- Write `tests/settings.json` with the collected values:
``json { "dataverse": { "environmentUrl": "<value>", "tenantId": "<value>", "clientId": "<value>" }, "testRun": { "agentConfigurationId": "<value>", "agentTestSetId": "<value>" } } ``
- If all values are already configured and valid, proceed to Phase 2.
Phase 2: Run Tests
- Ensure `tests/package.json` exists in the user's project. If not, copy it:
``bash cp ${CLAUDE_SKILL_DIR}/../../tests/package.json ./tests/package.json ``
- Install dependencies if
tests/node_modules/doesn't exist:
``bash npm install --prefix tests ``
- Run the test script in the background with a 100-minute timeout (6000000ms):
``bash node ${CLAUDE_SKILL_DIR}/../../tests/run-tests.js --config-dir ./tests ` Use run_in_background: true` for this command. Save the returned task ID.
- Wait 10 seconds, then check the background task output (non-blocking check).
- Detect the authentication state from the output:
- If the output contains "Using cached token": Authentication succeeded automatically. Tell the user: "Authentication successful (cached credentials). Tests are running, this may take several minutes..."
- If the output contains "use a web browser to open the page": Extract the URL and device code from the message. Present this prominently to the user:
> Authentication Required > > Open your browser to: https://microsoft.com/devicelogin > Enter the code: XXXXXXXXX (extract the actual code from the output) > > After signing in, the tests will continue automatically.
- If the output contains an error: Report the error to the user and stop.
- If the output is empty or incomplete: Wait another 10 seconds and check again (retry up to 3 times).
- Wait for the background task to complete (blocking). The script polls every 20 seconds until all tests finish and downloads results as a CSV.
- Read the final output to get the success rate and CSV filename.
- Proceed to Phase 3.
Phase 3: Analyze Results
- Get the results:
Glob: tests/test-results-*.csv— read the most recent CSV file (newest by modification time).
- Parse the CSV columns:
| Column | Meaning | |
|---|---|---|
| Test Utterance | The user message that was tested | |
| Expected Response | What the test expected | |
| Response | What the agent actually responded | |
| Latency (ms) | Response time | |
| Result | Success, Failed, Unknown, Error, or Pending | |
| Test Type | Response Match, Topic Match, Generative Answers, Multi-turn, Plan Validation, or Attachments | |
| Result Reason | Why the test passed or failed |
- Focus on failed tests (Result =
FailedorError). For each failure, analyze:
- Test Type = Topic Match: The wrong topic was triggered, or no topic matched. Check trigger phrases and model descriptions.
- Test Type = Response Match: The response didn't match expected. Check
SendActivitymessages, instructions, or generative answer config. - Test Type = Generative Answers: The generative answer was incorrect or missing. Check knowledge sources,
SearchAndSummarizeContent, and agent instructions. - Test Type = Plan Validation: The orchestrator's plan was wrong. Check topic descriptions and agent-level instructions.
- Test Type = Multi-turn: A multi-turn conversation failed. Check topic flow, variable handling, and conditions.
- Proceed to Phase 4 (Propose Fixes).
Phase 4: Propose Fixes
- For each failure, identify the relevant YAML file(s):
- Auto-discover the agent:
Glob: **/agent.mcs.yml - Find the relevant topic by matching the test utterance against trigger phrases and model descriptions
- Read the topic file to understand the current flow
- Propose specific YAML changes to fix each failure. Present them to the user as a summary:
- Which test(s) failed and why
- Which file(s) need changes
- What the proposed change is (show the diff)
- Wait for user decision. The user can:
- Accept all — apply all proposed changes
- Accept partially — apply only some changes (ask which ones)
- Reject — discard proposed changes and discuss alternative approaches
- Apply accepted changes using the Edit tool. After applying, remind the user to push and publish again before re-running tests.
Test Result Codes Reference
Result: 1=Success, 2=Failed, 3=Unknown, 4=Error, 5=PendingTest Type: 1=Response Match, 2=Topic Match, 3=Attachments, 4=Generative Answers, 5=Multi-turn, 6=Plan ValidationRun Status: 1=Not Run, 2=Running, 3=Complete, 4=Not Available, 5=Pending, 6=Error