Docker Health Checks
Monitor Docker containers and orchestration platforms
Overview
Docker containers need monitoring just like traditional services. This guide shows you how to integrate Telemetry.host with Docker health checks, Docker Compose, and Kubernetes.
Docker Container Health Checks
Basic Dockerfile Health Check
Add a health check that also reports to Telemetry.host:
FROM ubuntu:22.04
# Install curl for monitoring
RUN apt-get update && apt-get install -y curl
# Your application setup
COPY app.sh /app.sh
RUN chmod +x /app.sh
# Health check that monitors locally AND reports externally
HEALTHCHECK --interval=5m --timeout=3s \
CMD /app.sh --health-check && \
curl -sf -X POST https://telemetry.host/ping/{MONITOR_ID} || exit 1
CMD ["/app.sh"]
Separate Health Check Script
Create a dedicated health check script:
#!/bin/bash
# healthcheck.sh
# Check if application is responding
if curl -sf http://localhost:8080/health > /dev/null; then
# Application is healthy, report to monitoring
curl -sf -X POST https://telemetry.host/ping/{MONITOR_ID} \
-d '{"status":"success","message":"Container healthy"}'
exit 0
else
# Application is unhealthy
curl -sf -X POST https://telemetry.host/ping/{MONITOR_ID} \
-d '{"status":"error","message":"Container unhealthy"}'
exit 1
fi
Use in Dockerfile:
COPY healthcheck.sh /healthcheck.sh
RUN chmod +x /healthcheck.sh
HEALTHCHECK --interval=5m --timeout=10s \
CMD /healthcheck.sh
Docker Compose Integration
Method 1: Health Check in Compose
version: '3.8'
services:
web:
image: myapp:latest
healthcheck:
test: ["CMD", "curl", "-sf", "-X", "POST",
"https://telemetry.host/ping/${MONITOR_ID}",
"-d", '{"status":"success"}']
interval: 5m
timeout: 10s
retries: 3
start_period: 40s
Method 2: Separate Monitor Container
Create a dedicated monitoring sidecar:
version: '3.8'
services:
web:
image: myapp:latest
monitor:
image: curlimages/curl:latest
depends_on:
- web
environment:
- MONITOR_ID=${MONITOR_ID}
command: >
sh -c "
while true; do
if wget -q --spider http://web:8080/health; then
curl -X POST https://telemetry.host/ping/$$MONITOR_ID -d '{\"status\":\"success\"}';
else
curl -X POST https://telemetry.host/ping/$$MONITOR_ID -d '{\"status\":\"error\"}';
fi
sleep 300;
done
"
Method 3: External Monitoring Script
Run monitoring from the Docker host:
#!/bin/bash
# docker-monitor.sh
CONTAINER_NAME="myapp"
MONITOR_ID="your-monitor-id"
# Check if container is running
if docker ps --filter "name=$CONTAINER_NAME" --filter "status=running" | grep -q "$CONTAINER_NAME"; then
# Container is running, check health
HEALTH=$(docker inspect --format='{{.State.Health.Status}}' "$CONTAINER_NAME" 2>/dev/null)
if [ "$HEALTH" = "healthy" ] || [ -z "$HEALTH" ]; then
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d '{"status":"success","message":"Container running"}'
else
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d "{\"status\":\"error\",\"message\":\"Container unhealthy: $HEALTH\"}"
fi
else
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d '{"status":"error","message":"Container not running"}'
fi
Add to crontab:
*/5 * * * * /usr/local/bin/docker-monitor.sh
Kubernetes Integration
Liveness Probe with Monitoring
Create a health check endpoint that also reports to monitoring:
# health.py
from flask import Flask, jsonify
import requests
import os
app = Flask(__name__)
MONITOR_URL = os.getenv('TELEMETRY_MONITOR_URL')
@app.route('/health')
def health():
# Check application health
healthy = check_app_health()
# Report to monitoring (async in production)
try:
if healthy:
requests.post(MONITOR_URL, json={"status": "success"}, timeout=2)
else:
requests.post(MONITOR_URL, json={"status": "error"}, timeout=2)
except:
pass # Don't fail health check if monitoring fails
if healthy:
return jsonify({"status": "healthy"}), 200
else:
return jsonify({"status": "unhealthy"}), 503
def check_app_health():
# Your health check logic
return True
Kubernetes deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
containers:
- name: myapp
image: myapp:latest
env:
- name: TELEMETRY_MONITOR_URL
value: "https://telemetry.host/ping/YOUR_MONITOR_ID"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 300 # Every 5 minutes
CronJob Monitoring
Monitor Kubernetes CronJobs:
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:15
env:
- name: MONITOR_URL
value: "https://telemetry.host/ping/PROJECT_KEY/timeout/26h/k8s-backup?create=1"
command:
- /bin/sh
- -c
- |
set -e
pg_dump -h postgres mydb | gzip > /backup/mydb.sql.gz
echo "Backup completed" | curl -X POST "$MONITOR_URL" \
-H "Content-Type: text/plain" --data-binary @-
restartPolicy: OnFailure
Job Success/Failure Monitoring
Monitor job completion:
apiVersion: batch/v1
kind: Job
metadata:
name: data-migration
spec:
template:
spec:
containers:
- name: migrate
image: myapp:latest
env:
- name: MONITOR_ID
value: "YOUR_MONITOR_ID"
command:
- /bin/sh
- -c
- |
if /app/migrate.sh; then
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d '{"status":"success","message":"Migration completed"}'
else
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d '{"status":"error","message":"Migration failed"}'
exit 1
fi
restartPolicy: Never
backoffLimit: 3
Docker Swarm
Monitor services in Docker Swarm:
#!/bin/bash
# swarm-monitor.sh
SERVICE_NAME="myapp"
MONITOR_ID="your-monitor-id"
# Get service status
REPLICAS=$(docker service ls --filter "name=$SERVICE_NAME" --format '{{.Replicas}}')
if echo "$REPLICAS" | grep -q '/'; then
RUNNING=$(echo "$REPLICAS" | cut -d'/' -f1)
DESIRED=$(echo "$REPLICAS" | cut -d'/' -f2)
if [ "$RUNNING" = "$DESIRED" ] && [ "$RUNNING" -gt 0 ]; then
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d "{\"status\":\"success\",\"message\":\"$RUNNING/$DESIRED replicas running\"}"
else
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d "{\"status\":\"error\",\"message\":\"Only $RUNNING/$DESIRED replicas running\"}"
fi
else
curl -X POST https://telemetry.host/ping/$MONITOR_ID \
-d '{"status":"error","message":"Service not found"}'
fi
Docker Events Monitoring
Monitor Docker events for container crashes:
#!/usr/bin/env python3
# docker-event-monitor.py
import docker
import requests
import os
client = docker.from_env()
MONITOR_ID = os.getenv('MONITOR_ID')
def send_check_in(status, message):
try:
requests.post(
f'https://telemetry.host/ping/{MONITOR_ID}',
json={'status': status, 'message': message},
timeout=5
)
except Exception as e:
print(f"Failed to send check-in: {e}")
# Monitor container events
for event in client.events(decode=True):
if event['Type'] == 'container':
status = event['status']
container_name = event['Actor']['Attributes'].get('name', 'unknown')
if status == 'die':
exit_code = event['Actor']['Attributes'].get('exitCode', 'unknown')
if exit_code != '0':
send_check_in('error',
f"Container {container_name} died with exit code {exit_code}")
elif status == 'health_status: unhealthy':
send_check_in('error',
f"Container {container_name} became unhealthy")
Run as a service:
# /etc/systemd/system/docker-monitor.service
[Unit]
Description=Docker Event Monitor
After=docker.service
Requires=docker.service
[Service]
Environment="MONITOR_ID=your-monitor-id"
ExecStart=/usr/local/bin/docker-event-monitor.py
Restart=always
[Install]
WantedBy=multi-user.target
Best Practices
1. Separate Health Checks from Monitoring
Don’t let monitoring failures affect container health:
# ✅ Good: Health check succeeds even if monitoring fails
curl -f http://localhost:8080/health && \
(curl -X POST https://telemetry.host/ping/{ID} || true)
2. Use Auto Mode for Scaled Services
For services with auto-scaling:
environment:
- MONITOR_URL=https://telemetry.host/ping/PROJECT_KEY/auto/scaled-service?create=1
Auto mode adapts to changing check-in frequency as replicas scale.
3. Monitor at Multiple Levels
- Container level: Individual container health
- Service level: Overall service availability
- Job level: Batch job completion
4. Set Appropriate Intervals
Match health check interval to monitoring timeout:
healthcheck:
interval: 5m # Check every 5 minutes
# Set monitor timeout to 6-7 minutes to allow for missed check
# https://telemetry.host/ping/KEY/timeout/7m/container
5. Include Context in Messages
CONTAINER_ID=$(hostname)
curl -X POST https://telemetry.host/ping/{ID} \
-d "{\"status\":\"success\",\"message\":\"Container $CONTAINER_ID healthy\"}"
Troubleshooting
Health Checks Pass But No Monitoring
Check:
- Container has internet access
- DNS resolution works inside container
- Firewall allows outbound HTTPS
- Test manually:
docker exec <container> curl https://telemetry.host
Monitoring Works But Health Checks Fail
Check:
- Health check timeout is sufficient
- Application starts before first health check
start_periodis long enough for initialization
Too Many False Positives
Causes:
- Health check interval too aggressive
- Network hiccups causing temporary failures
- Cold starts taking longer than expected
Solutions:
- Increase
intervalandtimeout - Use
retriesto allow transient failures - Increase
start_periodfor slow-starting apps
Next Steps
- Learn about SMART disk monitoring
- Explore CI/CD integration examples
- See microservices monitoring