Skill v1.0.0
Trusted Publisher100/100version: "1.0.0"
name: maf-online-endpoint description: "Deploy a Microsoft Agent Framework (MAF) workflow as a managed online endpoint to an Azure ML workspace or an Azure AI Foundry hub-based project. Wraps any workflow into an init()/run() scoring script, creates conda environment, endpoint and deployment YAMLs, deploy script, and assigns RBAC. Supports managed identity auth and Application Insights tracing. WHEN: deploy MAF workflow, deploy agent-framework workflow, create online endpoint for MAF, deploy workflow to AML, deploy workflow to Foundry project, managed online endpoint for agent workflow, wrap workflow in scoring script, deploy agent as endpoint, realtime endpoint in Foundry project."
Deploy MAF Workflow as a Managed Online Endpoint
This skill wraps a Microsoft Agent Framework (agent-framework) workflow into
a managed online endpoint using the standard scoring-script pattern
(init() / run()), following the patterns from the
azureml-examples managed endpoint samples.
The endpoint can be deployed to either:
| Deployment Target | Description | |
|---|---|---|
| Azure Machine Learning workspace | Standalone AML workspace — user provides subscription, resource group, and workspace name | |
| Azure AI Foundry hub-based project | An AI project living under a Foundry hub — the project name is the workspace name for az ml commands |
Both targets produce the same generated files (online-deployment/ directory)
and use the same `az ml` CLI / `azure-ai-ml` Python SDK. The difference is
in how the workspace is identified and RBAC scope.
Overview
The deployment creates all files in an online-deployment/ subdirectory under
the project root:
<project-root>/workflow.py ← the MAF workflowonline-deployment/score.py ← scoring scriptconda.yml ← conda environmentendpoint.yml ← endpoint configdeployment.yml ← deployment template (${VAR} placeholders)deploy.sh ← deploy script (Bash; see notes for Windows).gitignore ← ignores rendered YAML with secrets
- Scoring script (
score.py) —init()imports the workflow factory;
run() creates a fresh workflow instance per request to avoid concurrency errors.
- Conda environment (
conda.yml) — Python 3.11 with agent-framework and
azureml-inference-server-http.
- Endpoint YAML (
endpoint.yml) — endpoint name and auth mode. - Deployment YAML (
deployment.yml) — template with${VAR}placeholders
for environment variables, instance config, and request settings.
- Deploy script (
deploy.sh) — renders the template, creates the endpoint
and deployment, runs a smoke test.
Path resolution rule: AML CLI resolvesconda_file,code, andscoring_scriptpaths relative to the YAML file location, not the CWD.Since the YAML is insideonline-deployment/, useconda_file: conda.yml(same directory) andcode: ..(parent = project root).
Agent Interaction Pattern
When the user asks to deploy a MAF workflow as an online endpoint:
- Ask for the deployment target using
vscode_askQuestions:
- Azure Machine Learning workspace — standalone AML workspace
- Azure AI Foundry project — hub-based project (the project name is the
workspace name)
- Ask for infrastructure variables (Step 0 §A) using
vscode_askQuestions:
subscription, resource group, workspace/project name, and the workflow file path.
- Read the workflow file. Inspect imports and
os.environ/os.getenv
calls to discover what the workflow needs (Step 0 §B).
- Ask the user to provide values for any workflow-specific variables that
have no defaults.
- Generate the files from the templates in ./assets/ into an
online-deployment/ subdirectory under the project root.
- Run deployment commands via terminal. On Windows, run
azCLI commands
directly in PowerShell (the Bash deploy.sh won't work). On Linux/macOS,
use deploy.sh or run the commands directly.
- Assign RBAC (Step 6) — only needed for managed-identity workflows
(Foundry/DefaultAzureCredential). Skip for API-key workflows. For Foundry hub-based projects, scope the role assignment to the hub's AI Services resource.
- Wait 5–10 minutes for RBAC propagation (if applicable), then run smoke
test.
- Report the scoring URI and remind user to
.gitignorerendered YAML
files that contain secrets.
Step 0 — Gather Required Information
A. Online Endpoint Infrastructure (always required)
The same variables apply to both deployment targets. For a **Foundry hub-based project, `WORKSPACE_NAME` is the AI project name** (not the hub name).
| Variable | Description | Default | |
|---|---|---|---|
SUBSCRIPTION_ID | Azure subscription | _(required)_ | |
RESOURCE_GROUP | Resource group containing the AML workspace or AI project | _(required)_ | |
WORKSPACE_NAME | AML workspace name or AI Foundry project name | _(required)_ | |
ENDPOINT_NAME | Name of the online endpoint | maf-endpoint | |
DEPLOYMENT_NAME | Deployment name under the endpoint | blue | |
INSTANCE_TYPE | VM SKU | Standard_DS3_v2 | |
INSTANCE_COUNT | Number of instances | 1 | |
REQUEST_TIMEOUT_MS | Request timeout in ms | 60000 |
Foundry project note: An AI Foundry hub-based project is backed by an AMLworkspace. Allaz mlcommands and theMLClientSDK work the same way —just use the project name as--workspace-name. The endpoint scoring URIformat is identical:https://<endpoint-name>.<region>.inference.ml.azure.com/score
B. Workflow Requirements (depends on the workflow)
Read the user's workflow file and inspect:
- Imports — determine pip packages for
conda.yml. - `os.environ[...]` / `os.getenv(...)` calls — determine environment
variables the deployment must inject.
- Credential usage —
DefaultAzureCredential/ManagedIdentityCredential
means RBAC must be set up; an API key means a secret env var.
Common workflow patterns
| Pattern | Imports | Required env vars | Extra pip packages | RBAC role | |
|---|---|---|---|---|---|
| Foundry LLM | FoundryChatClient, DefaultAzureCredential | FOUNDRY_PROJECT_ENDPOINT, FOUNDRY_MODEL | agent-framework | Cognitive Services User | |
| OpenAI API key | OpenAIChatClient | AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT, AZURE_OPENAI_API_KEY | agent-framework, agent-framework-openai | _(none — uses API key)_ | |
| RAG (AI Search) | AzureAISearchContextProvider | above + AZURE_AI_SEARCH_ENDPOINT, AZURE_AI_SEARCH_INDEX_NAME, AZURE_AI_SEARCH_API_KEY | above + agent-framework-azure-ai-search | above (Search uses API key) | |
| Function tools | plain Python functions | same as Foundry LLM | same as Foundry LLM | same as Foundry LLM |
C. RBAC (after endpoint is created)
Get endpoint managed identity principal ID:
az ml online-endpoint show --name <endpoint> --query identity.principal_id -o tsv
D. Optional (cross-cutting)
| Variable | Default | Description | |
|---|---|---|---|
APPLICATIONINSIGHTS_CONNECTION_STRING | _(empty)_ | Enables OpenTelemetry tracing |
Step 1 — Generate the Scoring Script
Use the template at ./assets/score.py.
Key decisions:
AgentResponseis not JSON-serializable → extract.textbefore returning.project_root = Path(__file__).resolve().parents[1]—score.pyis one
level deep (online-deployment/score.py), so parents[1] reaches the
project root. Adjust if your layout differs.
asyncio.get_event_loop().run_until_complete()bridges the syncrun()to
the async workflow.
- Optional Application Insights tracing configured via env var.
- Input key: Inspect the workflow to determine what key to parse from the
request body (e.g. "text", "question"). Adapt accordingly.
- Factory import:
init()imports thecreate_workflowfactory from
workflow.py. Each run() call invokes the factory to get a fresh workflow
instance, avoiding RuntimeError: Workflow is already running on concurrent
requests.
Step 2 — Generate conda.yml
Use the template at ./assets/conda.yml.
Important:
- Do NOT pin version constraints unless the user specifies them. Packages like
agent-framework-azure-ai-search may not have published version ranges on
PyPI, which causes image build failures.
- Only include packages the workflow actually uses. For OpenAI API key
workflows, include agent-framework-openai but omit
agent-framework-azure-ai-search and azure-monitor-opentelemetry unless
needed.
Step 3 — Generate endpoint.yml
Use the template at ./assets/endpoint.yml.
Step 4 — Generate deployment.yml (Template)
Use the template at ./assets/deployment.yml.
Critical — path resolution:
AML CLI resolves all relative paths in the deployment YAML **relative to the
YAML file's location**, not the working directory. Since deployment files live
in online-deployment/:
environment:conda_file: conda.yml # ← same dir as deployment.ymlcode_configuration:code: .. # ← parent dir = project rootscoring_script: online-deployment/score.py # ← relative to code root
Getting this wrong causes a double-nesting error like
online-deployment/online-deployment/conda.yml.
Other key settings:
request_timeout_ms: 60000— LLM calls typically take 5–30 s; the AML
default of 5 s causes timeouts.
- Use
conda_file(notpip_requirements) — the latter is not valid for
inline environment definitions.
- When rendering with
envsubst, use a restricted variable list so
$schema is not eaten.
- Only include env vars the workflow actually needs; omit unused ones.
Security: The rendered YAML (deployment-rendered.yml) may contain API
keys in plaintext. A .gitignore file is generated automatically to exclude
it (see Step 4b).
Step 4b — Generate .gitignore
Always create online-deployment/.gitignore to prevent rendered YAML files
containing secrets from being committed:
deployment-rendered.yml
Step 5 — Deploy
Option A: Bash script (Linux/macOS)
Use the template at ./assets/deploy.sh. Requires
envsubst (part of gettext).
Option B: Direct CLI commands (Windows / any OS)
On Windows, deploy.sh won't work (envsubst, mktemp, process substitution
are unavailable). Instead, run the steps directly in PowerShell:
# 1. Render deployment YAML (replace placeholders with actual values)$content = Get-Content online-deployment/deployment.yml -Raw$content = $content -replace '\$\{AZURE_OPENAI_ENDPOINT\}', $env:AZURE_OPENAI_ENDPOINT# ... repeat for each placeholder ...Set-Content -Path online-deployment/deployment-rendered.yml -Value $content# 2. Create endpointaz ml online-endpoint create `--subscription $SUBSCRIPTION_ID `--resource-group $RESOURCE_GROUP `--workspace-name $WORKSPACE_NAME `--file online-deployment/endpoint.yml# 3. Create deployment (run from the project root directory!)az ml online-deployment create `--subscription $SUBSCRIPTION_ID `--resource-group $RESOURCE_GROUP `--workspace-name $WORKSPACE_NAME `--file online-deployment/deployment-rendered.yml `--all-traffic# 4. Smoke testSet-Content -Path online-deployment/request.json -Value '{"text": "Hello"}'az ml online-endpoint invoke `--subscription $SUBSCRIPTION_ID `--resource-group $RESOURCE_GROUP `--workspace-name $WORKSPACE_NAME `--name <ENDPOINT_NAME> `--request-file online-deployment/request.json
Important: Run theaz ml online-deployment createcommand from theproject root directory, not from insideonline-deployment/. The CLIresolvescode: ..relative to the YAML file, but the CWD also matters forfinding the YAML file itself.
Step 6 — RBAC for Managed Identity
After the endpoint is created, its system-assigned managed identity needs the `Cognitive Services User` role on the Foundry resource.
Finding the AI Services resource
- Standalone AML workspace: The user must provide the AI Services
(Cognitive Services) resource name and resource group.
- Foundry hub-based project: The AI Services resource is linked to the hub.
Find it via the Azure Portal (Hub → Connected resources) or via CLI:
``bash
az ml workspace show \
--name <project-name> \
--resource-group <rg> \
--query "associated_workspaces" -o table
``
Assign the role
# Get principal IDPRINCIPAL_ID=$(az ml online-endpoint show \--subscription "$SUBSCRIPTION_ID" \--name <ENDPOINT_NAME> \--resource-group "$RESOURCE_GROUP" \--workspace-name "$WORKSPACE_NAME" \--query identity.principal_id -o tsv)# Assign roleaz role assignment create \--assignee-object-id "$PRINCIPAL_ID" \--assignee-principal-type ServicePrincipal \--role "Cognitive Services User" \--scope "/subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<account>"
Why `Cognitive Services User`?
Azure AI Developerdoes not includeMicrosoft.CognitiveServices/accounts/AIServices/agents/write.Cognitive Services Userhas the wildcardMicrosoft.CognitiveServices/*.- Allow 5–10 minutes for RBAC data plane propagation.
Foundry hub-based project — additional considerations
- The hub's managed network may restrict outbound access. Ensure the
endpoint can reach the AI Services resource and any external APIs the workflow calls. If the hub uses a private endpoint, no extra steps are needed for AI Services calls within the same VNet.
- The user deploying must have the Azure AI Developer role (or equivalent)
on the resource group to create endpoints and deployments in the project.
See ./references/managed-identity.md for full details.
Troubleshooting
| Symptom | Cause | Fix | |
|---|---|---|---|
No such file: .../online-deployment/online-deployment/conda.yml | Paths in deployment YAML resolved relative to YAML location, not CWD | Use conda_file: conda.yml and code: .. when YAML is in a subdirectory | |
401 PermissionDenied | Missing RBAC | Assign Cognitive Services User on Foundry resource | |
upstream request timeout | 5 s default too short | request_timeout_ms: 60000 | |
AgentResponse is not JSON serializable | Returning raw workflow output | Extract .text from the response | |
pip_requirements validation error | Invalid field for inline env | Use conda_file instead | |
| Image build fails on version constraints | Package not on PyPI with that version | Remove version pins from conda.yml | |
$schema missing after envsubst | Unrestricted envsubst eats $schema | Use restricted variable list | |
FileNotFoundError: az (Windows subprocess) | az is a .cmd file on Windows | Use shell=True in subprocess.run | |
envsubst not found (Windows) | envsubst is a Linux tool | Use PowerShell string replacement (see Step 5 Option B) | |
| Deployment fails in Foundry project with network error | Hub managed network blocks outbound access | Check hub network settings; add outbound rules for required endpoints | |
| Cannot create endpoint in Foundry project | Insufficient RBAC on the project | User needs Azure AI Developer role on the resource group |
See ./references/troubleshooting.md for extended diagnostics.