← Back to blog

Database Migrations Without Downtime: A Practical Guide

·
engineeringarchitecture

There's a special kind of anxiety that comes with running ALTER TABLE on a production database that handles thousands of transactions per minute. One wrong move and you're looking at a locked table, queued requests, and a very long night.

Over the years, I've developed a set of practices that make database migrations boring — which, in production operations, is exactly what you want.

The Golden Rule

Never make a breaking change in a single deployment. Every migration should be backward compatible with the currently running code. If the migration goes out and you need to roll back the application, the old code should still work with the new schema.

This means no dropping columns, no renaming columns, and no adding NOT NULL columns without defaults — at least not in a single step.

The Expand-Contract Pattern

Most schema changes follow a three-phase process:

Phase 1 — Expand. Add the new column, table, or index alongside the existing structure. Both old and new code work with the schema. Deploy the migration.

Phase 2 — Migrate. Deploy application code that writes to both old and new structures. Backfill existing data from old to new. Verify that the new structure has complete, correct data.

Phase 3 — Contract. Once you're confident in the new structure and the old code is no longer deployed, remove the old column or table.

Yes, this means a column rename takes three deployments instead of one. That's the price of zero downtime — and it's worth paying every time.

Handling Large Tables

Adding an index to a table with 100 million rows can lock the table for minutes or hours. In PostgreSQL, the solution is CREATE INDEX CONCURRENTLY — it builds the index without holding a lock on the table.

For large data backfills, I process in batches:

UPDATE transactions
SET new_column = computed_value
WHERE id BETWEEN $start AND $end

Process 10,000 rows at a time with a small delay between batches. This keeps the database responsive while the backfill progresses. Monitor replication lag if you're using replicas — large writes can cause replicas to fall behind.

Migration Checklist

Before every production migration, I verify:

  • Backward compatibility. Can the current application code run against the new schema?
  • Forward compatibility. Can the new application code run against the old schema? (For rollback scenarios)
  • Lock analysis. Will the migration acquire locks that could block queries? For how long?
  • Data volume. How much data will be affected? What's the expected duration?
  • Rollback plan. If something goes wrong, how do we revert?
  • Monitoring. What should we watch during and after the migration?

The Human Factor

Technical practices aside, the most important migration safety measure is communication. Before running any migration:

  1. Notify the team that a migration is happening
  2. Share the expected duration and impact
  3. Have a rollback plan documented and agreed upon
  4. Ensure someone is monitoring the system throughout

Most migration disasters I've seen weren't caused by bad SQL — they were caused by someone running a migration without telling anyone, during peak traffic, with no rollback plan.

Tooling

I've used several migration tools across different stacks, but the principles are the same regardless of tooling:

  • Version your migrations with sequential, monotonic identifiers
  • Make migrations idempotent when possible — running the same migration twice shouldn't break anything
  • Test migrations against production-like data volumes before running them in production
  • Keep migrations small and focused — one logical change per migration

The goal is always the same: make schema changes routine, safe, and boring. Your users should never notice a migration happening. And your on-call engineer should never dread one.