PostgreSQL Backup System for Docker with S3 and Slack

in #dev3 hours ago

At 2:23 AM last Tuesday, something went wrong with our database. Backups were a few days old, and so recovery took longer. That’s when I realized our backup system had to be better.

Managed databases can be expensive to run. Going self-managed is the alternative, but it comes with its own responsibilities to achieve resiliency. Part of that for me involves building a backup system that:

  • Runs automatically at 2 AM

  • Retries up to 10 times if something fails

  • Sends Slack notifications on success or failure

  • Stores backups in S3

  • Makes restoration a single command

No database was harmed in the making of this system. Here's how I did it.

The Stack

  • Database: PostgreSQL 17 in Docker

  • Storage: AWS S3 with lifecycle policies

  • Orchestration: Bash scripts + cron

  • Notifications: Slack webhooks

  • Backup tool: pg_dump (PostgreSQL's built-in tool)

Architecture Overview

The backup system has three main components:

  1. The Backup Script - Handles the actual pg_dump, S3 upload, and local cleanup. Includes retry logic for resilience.

  2. Cron Jobs - Schedules backups (production daily, staging weekly).

  3. Notification Webhook - POSTs status updates directly to Slack webhook.

This is straightforward… cron triggers the script → script backs up database → uploads to S3 → sends notification → cleans up old local files

Implementation

Step 1: Configure AWS and Docker

AWS Setup:

First, you want configure AWS CLI on your host machine and create S3 buckets for your backups. Install AWS CLI if it’s not.

aws configure
# Enter your credentials and region

# Test access
aws s3 ls

Docker Setup:

Add a volume mount to your database service in compose.yml:

services:
  db:
    image: postgres:17.2-bookworm
    volumes:
      - db_data:/var/lib/postgresql/data
      - ./backups:/backups  # Add this for backup files
    environment:
      POSTGRES_DB: myapp_db
      POSTGRES_USER: myapp_user
      POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
  db_data:

Restart your containers to apply the changes:

docker compose up -d

Production Note: I actually use Docker Swarm in production/staging for better orchestration. The backup strategy is pretty much the same. Just update container filters to docker ps -qf "name=stackname_db.1" instead of "name=projectname_db". I'll cover this in the Production Considerations section.

Step 2: The Backup Script

Create ~/scripts/backup.sh with three key functions:

  1. run_backup() - Executes pg_dump via docker exec, uploads to S3, cleans up old local files

  2. send_notification() - POSTs backup status directly to Slack

  3. Retry logic - attempts backup up to 10 times with 10-minute intervals

Here's the simplified structure:

#!/bin/bash
set -euo pipefail

ENV="${1:-prod}"
MAX_RETRIES=10
RETRY_INTERVAL=600  # 10 minutes

# load configs (Slack webhook, AWS region, etc.)
source ~/.backup_env

# environment-specific config
if [ "$ENV" == "prod" ]; then
    DB_NAME="myapp_prod"
    DB_USER="myapp_user"
    S3_BUCKET="myapp-backups"
else
    DB_NAME="myapp_staging"
    DB_USER="myapp_user"
    S3_BUCKET="myapp-staging-backups"
fi

BACKUP_FILE="backup_$(date +%Y-%m-%dT%H-%M-%S).dump"

send_notification() {
    local status=$1
    local error_msg=${2:-""}
    
    # determine color
    local color="good"
    [[ "$status" == *"failure"* ]] && color="danger"
    
    # build the Slack payload
    local payload=$(cat <<EOF
{
  "text": "DB Backup for ${ENV}: ${status}",
  "attachments": [{
    "color": "${color}",
    "fields": [
      {"title": "Environment", "value": "${ENV}", "short": true},
      {"title": "Status", "value": "${status}", "short": true},
      {"title": "Backup File", "value": "${BACKUP_FILE}", "short": false}
      $([ -n "$error_msg" ] && echo ", {\"title\": \"Error\", \"value\": \"${error_msg}\", \"short\": false}")
    ]
  }]
}
EOF
)
    
    # send to Slack
    curl -X POST "$SLACK_WEBHOOK_URL" \
        -H "Content-Type: application/json" \
        -d "$payload" --silent --show-error
}

run_backup() {
    # find database container
    local container_id=$(docker ps -qf "name=db")
    
    # run pg_dump inside container
    docker exec "$container_id" \
        pg_dump -U $DB_USER -d $DB_NAME \
        -Fc -b -v -f /backups/$BACKUP_FILE
    
    # Upload to S3
    aws s3 cp ~/backups/$BACKUP_FILE \
        s3://$S3_BUCKET/ --region us-east-1
    
    # Keep only last 2 local backups
    ls -t ~/backups/*.dump | tail -n +3 | xargs -r rm
}

# Main execution with retry logic
if run_backup; then
    send_notification "success"
    exit 0
else
    send_notification "first_failure" "Initial backup failed"
    
    for i in $(seq 1 $MAX_RETRIES); do
        sleep $RETRY_INTERVAL
        if run_backup; then
            send_notification "success_after_retry"
            exit 0
        fi
    done
    
    send_notification "final_failure" "All retries exhausted"
    exit 1
fi

Breaking down the docker exec command with pg_dump:

docker exec -d 42f3acbe6d70 pg_dump -U $DB_USER -d $DB_NAME -Fc -b -v -f /backups/$BACKUP_FILE
│      │     │ │            │        │           │           ││  │  │  | │
│      │     │ │            │        │           │           ││  │  │  | └─ Output file path with variable.
│      │     │ │            │        │           │           ││  │  │  └─── File name flag.
│      │     │ │            │        │           │           ││  │  └────── Specifies verbose mode.
│      │     │ │            │        │           │           ││  └───────── Include large objects in the dump.
│      │     │ │            │        │           │           │└──────────── Output a custom-format archive suitable for input into pg_restore
│      │     │ │            │        │           │           └───────────── Selects the format of the output.
│      │     │ │            │        │           └───────────────────────── Specifies the name of the database to connect to. 
│      │     │ │            │        └───────────────────────────────────── User name to connect as.
│      │     │ │            └────────────────────────────────────────────── Export a PostgreSQL database as an SQL script or to other formats
│      │     │ └─────────────────────────────────────────────────────────── Container ID 
│      │     └───────────────────────────────────────────────────────────── Detached mode: run command in the background
└──────└─────────────────────────────────────────────────────────────────── Execute a command in a running container

Key decisions explained:

  • Why -Fc format? Custom format is compressed and allows selective restoration of specific tables.

  • Why retry logic? Networks fail. AWS can hiccup. Retries make the system resilient.

  • Why keep 2 local backups? Quick access for recent restores without hitting S3, but doesn't fill the disk.

  • Why direct Slack posting? Fewer dependencies. If you already have a notification service, you could POST there instead and let it handle formatting. Setting up a Slack webhook. What it looks like on Slack:

    ![]( align="center")

Make the script executable:

chmod +x ~/scripts/backup.sh

Create the environment file with your secrets:

cat > ~/.backup_env << 'EOF'
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
EOF

chmod 600 ~/.backup_env # only the file's owner has read and write access

Step 3: Automation with Cron

Schedule the backups to run automatically:

crontab -e

# add these lines:
# Production - daily at 2 AM
0 2 * * * /home/username/scripts/backup.sh prod >> /home/username/.db_backup_logs/backup-prod.log 2>&1

# Staging - Sundays at 2 AM  
0 2 * * 0 /home/username/scripts/backup.sh staging >> /home/username/.db_backup_logs/backup-staging.log 2>&1

# Cleanup old logs - daily at 3 AM
0 3 * * * find /home/username/.db_backup_logs -name "*.log" -mtime +30 -delete
mkdir -p ~/.db_backup_logs

Step 4: The Restoration Script

Backups are useless if you can't restore them. You could do it manually, but why not just use Bash scripting as well? Create ~/scripts/db-restore.sh:

#!/bin/bash
set -euo pipefail

# Usage: ./db-restore.sh prod backup_2025-12-13T02-00-00.dump [--full]

ENV="${1}"
BACKUP_FILE="${2}"
FULL_RESTORE="${3:-}"

# Download from S3
aws s3 cp s3://myapp-backups/$BACKUP_FILE ~/restore/

# Copy to container
CONTAINER_ID=$(docker ps -qf "name=db")
docker cp ~/restore/$BACKUP_FILE $CONTAINER_ID:/tmp/

# Restore (with confirmation prompt)
if [ "$FULL_RESTORE" == "--full" ]; then
    # Drop and recreate database
    docker exec -i $CONTAINER_ID psql -U postgres <<EOF
DROP DATABASE IF EXISTS myapp_prod;
CREATE DATABASE myapp_prod;
EOF
fi

# Run pg_restore
docker exec $CONTAINER_ID \
    pg_restore -U myapp_user -d myapp_prod \
    --clean --if-exists -v /tmp/$BACKUP_FILE

# Cleanup
docker exec $CONTAINER_ID rm /tmp/$BACKUP_FILE
rm ~/restore/$BACKUP_FILE

Make it executable:

chmod +x ~/scripts/db-restore.sh

Testing restoration is very important. You might want to do this regularly, perhaps monthly.

# Download latest prod backup
LATEST=$(aws s3 ls s3://myapp-backups/ | sort | tail -n 1 | awk '{print $4}')

# Restore to staging
./scripts/db-restore.sh staging $LATEST --full

Production Considerations

Docker Swarm vs Compose

In production, I use Docker Swarm instead of Compose for better orchestration and secrets management. It’s actually the same backup strategy with a few changes:

Container naming:

  • Compose: projectname_db_1

  • Stack: stackname_db.1.container_id

Finding containers:

# works for both
docker ps -qf "name=db"

Secrets management:

Docker Swarm handles secrets more securely:

services:
  db:
    secrets:
      - db_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password

secrets:
  db_password:
    external: true

Create the secret:

# I find stdin to be easier to work with
printf "your-database-password-here" | docker secret create db-password -

Then in the backup script, read the password from inside the container:

docker exec $CONTAINER_ID sh -c \
    "PGPASSWORD=\$(cat /run/secrets/db_password) pg_dump ..."

S3 Lifecycle Policies

If you want to prevent backup costs from growing forever, consider auto-deleting old backups:

  1. Go to S3 Console → Your bucket → Management → Lifecycle rules

  2. Create rule: Delete objects older than 30 days

  3. Apply to prefix: prod-db-backups/

Monitoring

Beyond Slack notifications, you could also:

  • Review logs: tail -50 ~/.db_backup_logs/backup-prod.log

  • Check S3: aws s3 ls s3://myapp-backups/ | tail

  • Test restoration at intervals (as mentioned above)

The Result

  • Automated nightly backups

  • Slack notifications keeping us notified

  • Testing restoration process

  • Off-site storage in S3

  • and of course, peace of mind, heh…

S3 buckets are way cheaper than RDS instances, so this to me is part of keeping costs low for projects without sacrificing the safety net of reliable backups.

Try It Yourself

Full scripts available in this GitHub Gist. The system is environment-agnostic. Only update the configuration variables for your setup.

Questions? Drop them in the comments.


Originally published at blog.theolujay.dev