Skill v1.0.1
Automated scan100/100~1 modified
version: "1.0.1" name: dd-apm description: APM - install, onboard, instrument, enable, set up, configure, traces, services, dependencies, performance analysis. Use for any request involving Datadog APM setup, instrumentation (SSI, ddtrace, agent install), or analysis. alwaysApply: true metadata: version: "1.1.0" author: datadog-labs repository: https://github.com/datadog-labs/agent-skills tags: datadog,apm,tracing,performance,distributed-tracing,dd-apm,install,onboarding,instrumentation,ssi,agent globs: "/ddtrace*,/datadog*.yaml,**/trace"
Datadog APM
Distributed tracing, service maps, and performance analysis.
Routing — Read This First
Match the user's request to one of the entries below. Each entry has the same shape: triggers → which sub-skill to load → the anti-pattern to avoid. If a request seems to fit more than one entry, see "Overlap disambiguation". If nothing matches, see "None of the above" at the end.
Kubernetes APM install / instrument / onboard — trigger when the user mentions Kubernetes, K8s, EKS, GKE, AKS, kind, minikube, K3s, helm, DatadogAgent CR, kubectl, SSI on a cluster, pod injection, or init containers.
Immediately read .claude/skills/dd-apm/k8s-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/k8s-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/k8s-ssi/verify-ssi/SKILL.md — do not proceed from memory.
⚠️ Common wrong approaches that LOOK like they work but silently fail:-helm install datadog datadog/datadog— the standard chart does NOT support SSI via DatadogAgent CR.- Addingddtraceimports orddtrace-runto the app — SSI auto-instruments WITHOUT any code changes.-admission.datadoghq.com/enabledannotations — that's admission controller config injection, not SSI init container injection.
Linux APM install / instrument / onboard — trigger when the user mentions a single host, VM, EC2 instance, bare-metal, RHEL/Ubuntu/Debian, systemd, or no orchestrator.
Immediately read .claude/skills/dd-apm/linux-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/linux-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/linux-ssi/verify-ssi/SKILL.md — do not proceed from memory.
⚠️ Do NOT install the agent via plainapt-get install datadog-agent(or yum equivalent) and assume SSI follows — host auto-instrumentation requires the install script with the SSI flags, which the sub-skill walks through.
Service rename / service remapping — trigger when the user mentions renaming a service, collapsing multiple service names, stripping suffixes/prefixes, or cleaning up inferred services.
Immediately read .claude/skills/dd-apm/service-remapping/SKILL.md now — do not proceed from memory.
⚠️ Do NOT changetags.datadoghq.com/servicelabels orDD_SERVICEenv vars to rename a service in Datadog. That requires a rollout and only affects new data. Use a service remapping rule — it rewrites the name at ingestion time with no deployment change.
Overlap disambiguation
When a request could plausibly fit more than one entry above, use these tiebreakers:
| Hint | Route to | |
|---|---|---|
| Cluster orchestrator mentioned (EKS/GKE/AKS/kind/K3s/minikube) — even if "just one node" | k8s-ssi | |
| Single host, VM, or EC2 with no orchestrator | linux-ssi | |
| "Several services that should be one" | service-remapping — the sub-skill picks the rule type based on whether the duplicates are real instrumented services or inferred entities (DBs, queues, external APIs) | |
| "My service shows under the wrong name" | First check DD_SERVICE on the deploy. If correct and the name is still wrong → service-remapping. | |
| "Reduce APM volume / cost / noise" | No sub-skill yet. Ask whether the user means sampling (fewer ingested traces) or retention filters (less indexed data) before suggesting commands. |
None of the above
If the request doesn't match any entry above, continue reading the trace-search, service analysis, and metrics content below. If even that doesn't fit, ask the user to clarify — do not invent a workflow.
Requirements
Datadog Labs Pup should be installed. See Setup Pup if not.
Command Execution Order (Token-Efficient)
For scoped commands, use this order:
- Check context first (prior outputs, conversation, saved values).
- If a required value is missing, run a discovery command first.
- If still ambiguous, ask the user to confirm.
- Then run the target command.
- Avoid speculative commands likely to fail.
Quick Start
pup auth login# Confirm env tag with the user first (do not assume production/prod/prd).pup apm services list --env <env> --from 1h --to nowpup traces search --query "service:api-gateway" --from 1h
Services
List Services
pup apm services list --env <env> --from 1h --to nowpup apm services stats --env <env> --from 1h --to now
Service Stats
pup apm services stats --env <env> --from 1h --to now
Service Map
# View dependenciespup apm flow-map --query "service:api-gateway&from=$(($(date +%s)-3600))000&to=$(date +%s)000" --env <env> --limit 10
Traces
Search Traces
# By servicepup traces search --query "service:api-gateway" --from 1h# Errors onlypup traces search --query "service:api-gateway status:error" --from 1h# Slow traces (>1s)pup traces search --query "service:api-gateway @duration:>1000ms" --from 1h# With specific tagpup traces search --query "service:api-gateway @http.url:/api/users" --from 1h
Trace Detail
# No direct get command for a single trace ID.# Use traces search with a narrow query and time window.pup traces search --query "trace_id:<trace_id>" --from 1h
Key Metrics
| Metric | What It Measures | |
|---|---|---|
trace.http.request.hits | Request count | |
trace.http.request.duration | Latency | |
trace.http.request.errors | Error count | |
trace.http.request.apdex | User satisfaction |
Service Level Objectives
Link APM to SLOs:
pup slos create --file slo.json
Common Queries
| Goal | Query | |
|---|---|---|
| Slowest endpoints | avg:trace.http.request.duration{*} by {resource_name} | |
| Error rate | sum:trace.http.request.errors{*} / sum:trace.http.request.hits{*} | |
| Throughput | sum:trace.http.request.hits{*}.as_rate() |
Troubleshooting
| Problem | Fix | |
|---|---|---|
| No traces | Check ddtrace installed, DD_TRACE_ENABLED=true | |
| Missing service | Verify DD_SERVICE env var | |
| Traces not linked | Check trace headers propagated | |
| High cardinality | Don't tag with user_id/request_id |