cron monitoring best-practices devops

Cron Job Monitoring: Best Practices for 2025

Learn how to monitor cron jobs effectively and avoid the most common pitfalls that lead to silent failures

Telemetry.host Team

Cron jobs are the backbone of automated system maintenance, but they’re also notoriously prone to silent failures. A backup script that hasn’t run in weeks, a cleanup job that’s been failing for months, or a critical report that never gets generated-these scenarios are all too common.

In this post, we’ll explore best practices for monitoring cron jobs in 2025 and show you how to catch failures before they become disasters.

The Problem with Cron

Cron is great at scheduling tasks, but terrible at telling you when things go wrong. By default:

  • Failed jobs send email (that nobody reads)
  • Output goes to /dev/null (lost forever)
  • Exit codes are ignored
  • No centralized visibility

This leads to silent failures-your cron jobs stop working, but you don’t know until it’s too late.

Best Practice #1: Always Capture Output

Never discard your script output:

# ❌ Bad: Output lost
0 2 * * * /path/to/backup.sh > /dev/null 2>&1

# ✅ Good: Output captured for debugging
0 2 * * * /path/to/backup.sh 2>&1 | curl -X POST https://telemetry.host/ping/YOUR_ID \
  -H "Content-Type: text/plain" --data-binary @-

When something fails, you’ll have the full context to debug it.

Best Practice #2: Check Exit Codes

Always verify your script succeeded:

#!/bin/bash
set -e  # Exit on any error

# Your commands here
pg_dump mydb > backup.sql
gzip backup.sql

# If we reach here, everything succeeded
echo "Backup completed successfully"
exit 0

The set -e ensures the script exits immediately on any error, making failures obvious.

Best Practice #3: Use Meaningful Timeouts

Set realistic timeouts that account for occasional delays:

# Daily backup at 2 AM
# Use 25-26 hour timeout (not exactly 24h)
# This allows for occasional delays without false alarms

https://telemetry.host/ping/PROJECT_KEY/timeout/26h/daily-backup?create=1

Too aggressive? False positives. Too loose? Real failures go unnoticed.

Best Practice #4: Test Your Monitoring

Before deploying, test both success and failure scenarios:

# Test success
./backup.sh && echo "✅ Success case works"

# Test failure
./backup.sh --force-error && echo "✅ Failure case detected"

Verify you receive notifications for failures.

Best Practice #5: Include Context in Reports

Don’t just report “success” or “failure”-include actionable information:

# ❌ Bad: No context
echo "Done" | curl -X POST $MONITOR_URL

# ✅ Good: Actionable information
echo "Backup completed: 2.5GB in 120 seconds, 15 tables backed up" | \
  curl -X POST $MONITOR_URL

When debugging at 3 AM, you’ll thank yourself for the extra details.

Best Practice #6: Monitor Critical Dependencies

If your cron job depends on external services, monitor those too:

#!/bin/bash

# Check prerequisites
if ! pg_isready -q; then
    echo "Database not available" | curl -X POST $MONITOR_URL
    exit 1
fi

if [ ! -d "/backups" ]; then
    echo "Backup directory missing" | curl -X POST $MONITOR_URL
    exit 1
fi

# Proceed with backup...

Best Practice #7: Use Auto-Provisioning

Define monitors in your scripts using auto-provisioning URLs:

# This URL will create the monitor if it doesn't exist
MONITOR_URL="https://telemetry.host/ping/PROJECT_KEY/timeout/26h/db-backup?create=1"

# Now deploy this script anywhere-monitoring is automatic
./backup.sh 2>&1 | curl -X POST "$MONITOR_URL" \
  -H "Content-Type: text/plain" --data-binary @-

Perfect for infrastructure-as-code and dynamic environments.

Best Practice #8: Separate Concerns

For complex jobs, monitor each critical step separately:

# Monitor backup creation
pg_dump mydb | gzip > backup.sql.gz
curl -X POST $BACKUP_MONITOR -d '{"status":"success"}'

# Monitor backup upload
aws s3 cp backup.sql.gz s3://backups/
curl -X POST $UPLOAD_MONITOR -d '{"status":"success"}'

# Monitor backup verification
gunzip -t backup.sql.gz
curl -X POST $VERIFY_MONITOR -d '{"status":"success"}'

This helps pinpoint exactly where failures occur.

Best Practice #9: Plan for Maintenance

Account for scheduled downtime in your monitoring:

# Use wider timeout during maintenance windows
# Normal: 25 hours
# During maintenance: 50 hours

if [ -f /etc/maintenance-mode ]; then
    TIMEOUT="50h"
else
    TIMEOUT="25h"
fi

curl -X POST "https://telemetry.host/ping/KEY/timeout/$TIMEOUT/backup"

Best Practice #10: Review Monitoring Regularly

Set a recurring task to review your monitors:

  • Are timeouts still appropriate?
  • Are false positives occurring?
  • Are notifications reaching the right people?
  • Are old monitors still needed?

Real-World Example

Here’s a production-ready backup script incorporating these practices:

#!/bin/bash
# production-backup.sh

set -euo pipefail

# Configuration
DB_NAME="production"
BACKUP_DIR="/backups"
MONITOR_URL="https://telemetry.host/ping/KEY/timeout/26h/prod-backup?create=1"
RETENTION_DAYS=30

# Validate prerequisites
[[ -d "$BACKUP_DIR" ]] || { echo "Backup dir missing"; exit 1; }
pg_isready -q || { echo "DB not ready"; exit 1; }

# Perform backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/${DB_NAME}_${TIMESTAMP}.sql.gz"
START_TIME=$(date +%s)

echo "Starting backup..."
pg_dump "$DB_NAME" | gzip > "$BACKUP_FILE"

# Verify
gunzip -t "$BACKUP_FILE" || { echo "Verification failed"; exit 1; }

# Cleanup old backups
find "$BACKUP_DIR" -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete

# Report success
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
REMAINING=$(find "$BACKUP_DIR" -name "*.sql.gz" | wc -l)

{
    echo "✅ Backup completed successfully"
    echo "Duration: ${DURATION}s"
    echo "Size: $SIZE"
    echo "Backups retained: $REMAINING"
} | curl -X POST "$MONITOR_URL" \
    -H "Content-Type: text/plain" --data-binary @-

echo "Done"

Conclusion

Monitoring cron jobs doesn’t have to be complicated. By following these best practices, you can:

  • Catch failures early before they impact users
  • Debug faster with full context logs
  • Sleep better knowing your critical jobs are monitored

The key is to treat monitoring as a first-class concern, not an afterthought. Build it into your scripts from day one, and you’ll save yourself countless hours of debugging and firefighting.

Get Started

Ready to implement these practices? Check out our quickstart guide to set up your first monitor in 5 minutes.

For more examples, see our cron monitoring guide and backup monitoring examples.