Skill v1.0.1

Automated scan100/100

datadog-labs/agent-skills/dd-apm

~1 modified

──Details

PublishedMay 27, 2026 at 10:12 PM

Content Hashsha256:acb02f69e0fd2d07...

Git SHAef2c1bd35084

Bump Typepatch

Compare with v1.0.0

──Files

Files (1 file, 6.8 KB)

SKILL.md6.8 KBactive

SKILL.md · 180 lines · 6.8 KB

version: "1.0.1" name: dd-apm description: APM - install, onboard, instrument, enable, set up, configure, traces, services, dependencies, performance analysis. Use for any request involving Datadog APM setup, instrumentation (SSI, ddtrace, agent install), or analysis. alwaysApply: true metadata: version: "1.1.0" author: datadog-labs repository: https://github.com/datadog-labs/agent-skills tags: datadog,apm,tracing,performance,distributed-tracing,dd-apm,install,onboarding,instrumentation,ssi,agent globs: "/ddtrace*,/datadog*.yaml,**/trace"

Datadog APM

Distributed tracing, service maps, and performance analysis.

Routing — Read This First

Match the user's request to one of the entries below. Each entry has the same shape: triggers → which sub-skill to load → the anti-pattern to avoid. If a request seems to fit more than one entry, see "Overlap disambiguation". If nothing matches, see "None of the above" at the end.

Kubernetes APM install / instrument / onboard — trigger when the user mentions Kubernetes, K8s, EKS, GKE, AKS, kind, minikube, K3s, helm, DatadogAgent CR, kubectl, SSI on a cluster, pod injection, or init containers.

Immediately read .claude/skills/dd-apm/k8s-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/k8s-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/k8s-ssi/verify-ssi/SKILL.md — do not proceed from memory.

⚠️ Common wrong approaches that LOOK like they work but silently fail:
- helm install datadog datadog/datadog — the standard chart does NOT support SSI via DatadogAgent CR.
- Adding ddtrace imports or ddtrace-run to the app — SSI auto-instruments WITHOUT any code changes.
- admission.datadoghq.com/enabled annotations — that's admission controller config injection, not SSI init container injection.

Linux APM install / instrument / onboard — trigger when the user mentions a single host, VM, EC2 instance, bare-metal, RHEL/Ubuntu/Debian, systemd, or no orchestrator.

Immediately read .claude/skills/dd-apm/linux-ssi/agent-install/SKILL.md now, then .claude/skills/dd-apm/linux-ssi/enable-ssi/SKILL.md, then .claude/skills/dd-apm/linux-ssi/verify-ssi/SKILL.md — do not proceed from memory.

⚠️ Do NOT install the agent via plain apt-get install datadog-agent (or yum equivalent) and assume SSI follows — host auto-instrumentation requires the install script with the SSI flags, which the sub-skill walks through.

Service rename / service remapping — trigger when the user mentions renaming a service, collapsing multiple service names, stripping suffixes/prefixes, or cleaning up inferred services.

Immediately read .claude/skills/dd-apm/service-remapping/SKILL.md now — do not proceed from memory.

⚠️ Do NOT change tags.datadoghq.com/service labels or DD_SERVICE env vars to rename a service in Datadog. That requires a rollout and only affects new data. Use a service remapping rule — it rewrites the name at ingestion time with no deployment change.

Overlap disambiguation

When a request could plausibly fit more than one entry above, use these tiebreakers:

Hint	Route to
Cluster orchestrator mentioned (EKS/GKE/AKS/kind/K3s/minikube) — even if "just one node"	k8s-ssi
Single host, VM, or EC2 with no orchestrator	linux-ssi
"Several services that should be one"	service-remapping — the sub-skill picks the rule type based on whether the duplicates are real instrumented services or inferred entities (DBs, queues, external APIs)
"My service shows under the wrong name"	First check `DD_SERVICE` on the deploy. If correct and the name is still wrong → service-remapping.
"Reduce APM volume / cost / noise"	No sub-skill yet. Ask whether the user means sampling (fewer ingested traces) or retention filters (less indexed data) before suggesting commands.

None of the above

If the request doesn't match any entry above, continue reading the trace-search, service analysis, and metrics content below. If even that doesn't fit, ask the user to clarify — do not invent a workflow.

Requirements

Datadog Labs Pup should be installed. See Setup Pup if not.

Command Execution Order (Token-Efficient)

For scoped commands, use this order:

Check context first (prior outputs, conversation, saved values).
If a required value is missing, run a discovery command first.
If still ambiguous, ask the user to confirm.
Then run the target command.
Avoid speculative commands likely to fail.

Quick Start

bash

pup auth login
# Confirm env tag with the user first (do not assume production/prod/prd).
pup apm services list --env <env> --from 1h --to now
pup traces search --query "service:api-gateway" --from 1h

Services

List Services

bash

pup apm services list --env <env> --from 1h --to now
pup apm services stats --env <env> --from 1h --to now

Service Stats

bash

pup apm services stats --env <env> --from 1h --to now

Service Map

bash

# View dependencies
pup apm flow-map --query "service:api-gateway&from=$(($(date +%s)-3600))000&to=$(date +%s)000" --env <env> --limit 10

Traces

Search Traces

bash

# By service
pup traces search --query "service:api-gateway" --from 1h
 
# Errors only
pup traces search --query "service:api-gateway status:error" --from 1h
 
# Slow traces (>1s)
pup traces search --query "service:api-gateway @duration:>1000ms" --from 1h
 
# With specific tag
pup traces search --query "service:api-gateway @http.url:/api/users" --from 1h

Trace Detail

bash

# No direct get command for a single trace ID.
# Use traces search with a narrow query and time window.
pup traces search --query "trace_id:<trace_id>" --from 1h

Key Metrics

Metric	What It Measures
`trace.http.request.hits`	Request count
`trace.http.request.duration`	Latency
`trace.http.request.errors`	Error count
`trace.http.request.apdex`	User satisfaction

Service Level Objectives

Link APM to SLOs:

bash

pup slos create --file slo.json

Common Queries

Goal	Query
Slowest endpoints	`avg:trace.http.request.duration{*} by {resource_name}`
Error rate	`sum:trace.http.request.errors{} / sum:trace.http.request.hits{}`
Throughput	`sum:trace.http.request.hits{*}.as_rate()`

Troubleshooting

Problem	Fix
No traces	Check ddtrace installed, DD_TRACE_ENABLED=true
Missing service	Verify DD_SERVICE env var
Traces not linked	Check trace headers propagated
High cardinality	Don't tag with user_id/request_id

References/Docs

← v1.0.0 All versions v1.0.2 →