Skill v1.0.1
currentAutomated scan100/1003 files
version: "1.0.1" name: gcp-gke-monitoring-observability description: | Set up logging, metrics, distributed tracing, and alerting for GKE applications. Use when configuring Cloud Logging, creating dashboards in Cloud Monitoring, instrumenting Spring Boot with metrics, setting up alerts for error rates or resource usage, or implementing distributed tracing with Cloud Trace. Includes Prometheus integration, structured logging patterns, and observability best practices for microservices. allowed-tools:
- Bash
- Read
- Write
- Glob
GKE Monitoring and Observability
Purpose
Implement comprehensive observability for GKE applications. This skill covers logging, metrics collection, visualization, distributed tracing, and alerting strategies for Spring Boot microservices.
When to Use
Use this skill when you need to:
- Set up Cloud Logging and Cloud Monitoring for GKE applications
- Instrument Spring Boot applications with Actuator metrics
- Configure Prometheus scraping for custom metrics
- Create dashboards to visualize application performance
- Set up alerts for error rates, resource usage, or SLOs
- Enable distributed tracing with Cloud Trace
- Debug production issues using logs and metrics
Trigger phrases: "set up monitoring", "GKE observability", "configure Prometheus", "create dashboard", "set up alerts", "enable tracing"
Table of Contents
- Purpose
- When to Use
- Quick Start
- Instructions
- Step 1: Enable Cloud Operations Integration
- Step 2: Configure Spring Boot Actuator
- Step 3: Annotate Pods for Prometheus Scraping
- Step 4: View Logs
- Step 5: Create Monitoring Dashboard
- Step 6: Set Up Alerts
- Step 7: Enable Distributed Tracing
- Examples
- Requirements
- See Also
Quick Start
Enable observability in three steps:
# 1. Enable Cloud Monitoring and Logging on clustergcloud container clusters update CLUSTER_NAME \--region=europe-west2 \--logging=SYSTEM,WORKLOAD \--monitoring=SYSTEM,WORKLOAD \--enable-managed-prometheus# 2. Deploy Prometheus scrape config for Spring Boot Actuatorkubectl apply -f - <<EOFapiVersion: v1kind: Servicemetadata:name: supplier-charges-hubnamespace: wtr-supplier-chargesspec:ports:- name: metricsport: 8080targetPort: 8080EOF# 3. View logs and metrics in Cloud Consolegcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=wtr-supplier-charges" --limit=50
Instructions
Step 1: Enable Cloud Operations Integration
Configure the cluster to collect logs and metrics:
gcloud container clusters update shared-gke-labs-01-euw2 \--region=europe-west2 \--logging=SYSTEM,WORKLOAD \--monitoring=SYSTEM,WORKLOAD \--enable-managed-prometheus \--enable-cloud-logging \--enable-cloud-monitoring
Components:
- Cloud Logging: Captures container stdout/stderr
- Cloud Monitoring: System metrics (CPU, memory, disk)
- Managed Service for Prometheus: Application metrics (requires annotation)
Step 2: Configure Spring Boot Actuator
Enable metrics and health endpoints in Spring Boot:
# application.ymlmanagement:endpoints:web:exposure:include: health,info,metrics,prometheus,env,configpropsendpoint:health:probes:enabled: trueshow-details: alwaysmetrics:enabled: truemetrics:distribution:percentiles-histogram:http.server.requests: truetags:application: supplier-charges-hubenvironment: labshealth:livenessState:enabled: truereadinessState:enabled: truelogging:pattern:console: '{"timestamp":"%d{ISO8601}","level":"%p","logger":"%c{1}","message":"%m"}%n'
Step 3: Annotate Pods for Prometheus Scraping
Mark pods for metrics collection:
apiVersion: apps/v1kind: Deploymentmetadata:name: supplier-charges-hubspec:template:metadata:annotations:prometheus.io/scrape: "true"prometheus.io/port: "8080"prometheus.io/path: "/actuator/prometheus"spec:containers:- name: supplier-charges-hub-containerports:- name: metricscontainerPort: 8080protocol: TCP
Step 4: View Logs
Query container logs from Cloud Logging:
# View recent logsgcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=wtr-supplier-charges" \--limit=50 \--format=json | jq '.[] | {timestamp: .timestamp, message: .textPayload}'# View logs with severity filtergcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=wtr-supplier-charges AND severity=ERROR" \--limit=20# View logs from specific podgcloud logging read "resource.type=k8s_pod AND resource.labels.pod_name=supplier-charges-hub-xyz123 AND resource.labels.namespace_name=wtr-supplier-charges" \--limit=50
Alternative: View via kubectl:
# Stream logskubectl logs -f deployment/supplier-charges-hub -n wtr-supplier-charges# View logs from specific container (for multi-container pods)kubectl logs deployment/supplier-charges-hub -c supplier-charges-hub-container -n wtr-supplier-charges# View previous logs (after pod restart)kubectl logs deployment/supplier-charges-hub -n wtr-supplier-charges --previous
Step 5: Create Monitoring Dashboard
Visualize key metrics:
apiVersion: monitoring.coreos.com/v1kind: PrometheusRulemetadata:name: supplier-charges-hub-metricsnamespace: wtr-supplier-chargesspec:groups:- name: application-metricsinterval: 30srules:- alert: HighErrorRateexpr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) > 0.05for: 5mannotations:summary: "High error rate detected"- alert: HighMemoryUsageexpr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85for: 10mannotations:summary: "Pod memory usage > 85%"
Step 6: Set Up Alerts
Create alert policies for critical metrics:
# Create alert for high error ratecat > alert-policy.yaml <<EOFdisplayName: "Supplier Charges Hub - High Error Rate"conditions:- displayName: "Error rate > 5%"conditionThreshold:filter: |resource.type="k8s_container"resource.namespace_name="wtr-supplier-charges"metric.type="logging.googleapis.com/user_defined_metric"metric.labels.severity="ERROR"comparison: COMPARISON_GTthresholdValue: 5duration: 300snotificationChannels:- projects/ecp-wtr-supplier-charges-labs/notificationChannels/12345EOF# Deploy via gcloud (requires proper setup)gcloud alpha monitoring policies create --policy-from-file=alert-policy.yaml
Step 7: Enable Distributed Tracing
Add Spring Cloud Sleuth for request tracing:
// build.gradle.ktsdependencies {implementation("org.springframework.cloud:spring-cloud-starter-sleuth")implementation("org.springframework.cloud:spring-cloud-sleuth-zipkin")}
Configuration:
# application.ymlspring:sleuth:sampler:probability: 0.1 # Sample 10% of requestszipkin:baseUrl: https://cloudtrace.googleapis.com # Cloud Trace endpoint
View traces:
gcloud traces list --limit=10gcloud traces describe TRACE_ID
Examples
Example 1: Complete Observability Setup
#!/bin/bash# Set up complete observability stack for Supplier Charges HubCLUSTER="shared-gke-labs-01-euw2"REGION="europe-west2"PROJECT="ecp-wtr-supplier-charges-labs"NAMESPACE="wtr-supplier-charges"echo "=== Setting Up GKE Observability ==="# Step 1: Enable cluster-level monitoringecho ""echo "1. Enabling Cloud Logging and Monitoring..."gcloud container clusters update $CLUSTER \--region=$REGION \--project=$PROJECT \--logging=SYSTEM,WORKLOAD \--monitoring=SYSTEM,WORKLOAD \--enable-managed-prometheus# Step 2: Apply Spring Boot metrics configurationecho ""echo "2. Configuring Spring Boot Actuator..."kubectl apply -f - <<EOFapiVersion: v1kind: ConfigMapmetadata:name: application-confignamespace: $NAMESPACEdata:application.yml: |management:endpoints:web:exposure:include: health,info,metrics,prometheusendpoint:health:probes:enabled: truemetrics:distribution:percentiles-histogram:http.server.requests: truehealth:livenessState:enabled: truereadinessState:enabled: truelogging:pattern:console: '{"timestamp":"%d{ISO8601}","level":"%p","message":"%m"}%n'EOF# Step 3: Update deployment with Prometheus annotationsecho ""echo "3. Adding Prometheus scrape annotations..."kubectl patch deployment supplier-charges-hub \-n $NAMESPACE \-p '{"spec":{"template":{"metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/port":"8080","prometheus.io/path":"/actuator/prometheus"}}}}}'# Step 4: Create sample dashboardecho ""echo "4. Creating Cloud Monitoring dashboard..."# (Dashboards created via Cloud Console or Cloud Monitoring API)echo ""echo "Observability setup complete!"echo ""echo "Next steps:"echo "1. View logs: gcloud logging read \"resource.type=k8s_container AND resource.labels.namespace_name=$NAMESPACE\" --limit=50"echo "2. Access Cloud Console: https://console.cloud.google.com/monitoring"echo "3. Create dashboards in Cloud Monitoring"
Example 2: Log Analysis and Error Tracking
#!/bin/bash# Query logs for errors and build reportNAMESPACE="wtr-supplier-charges"HOURS=24echo "=== Log Analysis Report ==="echo ""echo "1. Total Log Entries (last $HOURS hours)"gcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=$NAMESPACE AND timestamp>=\"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)\"" \--format="value(severity)" | wc -lecho ""echo "2. Error Count (last $HOURS hours)"gcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=$NAMESPACE AND severity=ERROR AND timestamp>=\"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)\"" \--limit=100 \--format="value(severity)"echo ""echo "3. Top Error Messages"gcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=$NAMESPACE AND severity=ERROR AND timestamp>=\"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)\"" \--limit=50 \--format="value(textPayload)" | sort | uniq -c | sort -rn | head -10echo ""echo "4. Exception Traces"gcloud logging read "resource.type=k8s_container AND resource.labels.namespace_name=$NAMESPACE AND textPayload=~\"Exception\" AND timestamp>=\"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)\"" \--limit=20 \--format="json" | jq '.[] | {pod: .resource.labels.pod_name, timestamp: .timestamp, message: .textPayload[:200]}'
Example 3: Health Check Endpoint Testing
#!/bin/bash# Test and verify observability endpointsPOD=$(kubectl get pods -l app=supplier-charges-hub -n wtr-supplier-charges -o jsonpath='{.items[0].metadata.name}')NAMESPACE="wtr-supplier-charges"echo "=== Testing Observability Endpoints ==="echo ""echo "1. Health Status"kubectl exec $POD -c supplier-charges-hub-container -n $NAMESPACE -- \curl -s http://localhost:8080/actuator/health | jq .echo ""echo "2. Application Metrics Available"kubectl exec $POD -c supplier-charges-hub-container -n $NAMESPACE -- \curl -s http://localhost:8080/actuator/metrics | jq '.names | length'echo ""echo "3. HTTP Request Metrics"kubectl exec $POD -c supplier-charges-hub-container -n $NAMESPACE -- \curl -s http://localhost:8080/actuator/metrics/http.server.requests | jq '.measurements[] | select(.statistic=="COUNT")'echo ""echo "4. JVM Memory Metrics"kubectl exec $POD -c supplier-charges-hub-container -n $NAMESPACE -- \curl -s http://localhost:8080/actuator/metrics/jvm.memory.used | jq '.measurements[] | select(.statistic=="VALUE")'echo ""echo "5. Prometheus Metrics Format"kubectl exec $POD -c supplier-charges-hub-container -n $NAMESPACE -- \curl -s http://localhost:8080/actuator/prometheus | head -20
Example 4: Create Custom Metrics Dashboard
#!/bin/bash# Create a Cloud Monitoring dashboard for Supplier Charges HubPROJECT="ecp-wtr-supplier-charges-labs"DASHBOARD_NAME="supplier-charges-hub-dashboard"cat > dashboard.json <<EOF{"displayName": "Supplier Charges Hub Dashboard","mosaicLayout": {"columns": 12,"tiles": [{"width": 6,"height": 4,"widget": {"title": "HTTP Requests Rate","xyChart": {"dataSets": [{"timeSeriesQuery": {"timeSeriesFilter": {"filter": "metric.type=\"kubernetes.io/container/restart_count\" resource.type=\"k8s_container\" resource.label.namespace_name=\"wtr-supplier-charges\""}}}]}}},{"xPos": 6,"width": 6,"height": 4,"widget": {"title": "Error Rate","xyChart": {"dataSets": [{"timeSeriesQuery": {"timeSeriesFilter": {"filter": "metric.type=\"logging.googleapis.com/user_defined_metric\" resource.type=\"k8s_container\" resource.label.namespace_name=\"wtr-supplier-charges\" metric.labels.severity=\"ERROR\""}}}]}}},{"yPos": 4,"width": 6,"height": 4,"widget": {"title": "Pod Memory Usage","xyChart": {"dataSets": [{"timeSeriesQuery": {"timeSeriesFilter": {"filter": "metric.type=\"kubernetes.io/container/memory/used_bytes\" resource.type=\"k8s_container\" resource.label.namespace_name=\"wtr-supplier-charges\""}}}]}}},{"xPos": 6,"yPos": 4,"width": 6,"height": 4,"widget": {"title": "Pod CPU Usage","xyChart": {"dataSets": [{"timeSeriesQuery": {"timeSeriesFilter": {"filter": "metric.type=\"kubernetes.io/container/cpu/core_usage_time\" resource.type=\"k8s_container\" resource.label.namespace_name=\"wtr-supplier-charges\""}}}]}}}]}}EOF# Create dashboardgcloud monitoring dashboards create --config-from-file=dashboard.json \--project=$PROJECTecho "Dashboard created: $DASHBOARD_NAME"
Requirements
- GKE cluster with Cloud Logging and Cloud Monitoring enabled
- Spring Boot application with Actuator dependency
kubectlaccess to the clustergcloudCLI configured- Service account with monitoring permissions:
roles/monitoring.metricWriter
See Also
- gke-deployment-strategies - Understand health checks
- gke-troubleshooting - Use logs to debug issues
- gke-cost-optimization - Monitor resource costs