Engineering Practitioner Brief / 18 May 2026

Database Schema Debt Cost

Application code can be rewritten in pieces. A live production database with 4 TB of customer data cannot. Schema debt is therefore the highest-stakes category of technical debt: the cost of carrying it is real but spread across years, and the cost of paying it down is concentrated into a few migration windows. This page breaks down both sides of that ledger.

Four schema-debt categories that account for most cost:

Missing or stale indexes (cheapest to fix, most common)
Denormalisation drift (silent corruption risk)
Missing referential integrity (orphaned-row cleanup tax)
Tables outgrown by their access pattern (largest migration cost)

Category 1: Missing or Stale Indexes

Indexes age out of usefulness in two ways. They were never created on a column that became a frequent filter predicate (a product manager added a new search facet, an engineer added a WHERE clause to an existing query, a new endpoint went live and nobody noticed the lookup pattern), or they were created early but the query pattern changed and the index is now unused. Both cases are easy to identify in modern databases.

For PostgreSQL, pg_stat_user_indexes reports per-index scan counts. Any index with zero scans across a meaningful time window is a candidate for removal. The pg_stat_statements extension surfaces the queries doing the most I/O, which is where missing indexes show up. For MySQL, the slow query log plus EXPLAIN against the worst offenders does the equivalent job. Percona Toolkit's pt-index-usage automates the unused-index search.

The economic case for adding an index is usually obvious once measured. A query that runs 100,000 times per day and drops from 200 ms to 2 ms after indexing saves 33 hours of CPU per day. On a database tier with 20 cores at $0.50 per core-hour, that is over $10,000 per year of CPU not bought. The economic case for removing an unused index is smaller but real: each index costs INSERT and UPDATE latency and consumes disk space. For a write-heavy table at million-row scale, removing five unused indexes can shave 15 to 25 percent off write latency.

Category 2: Denormalisation Drift

Caching the result of a join inside the primary table is fine when one team owns both. It becomes debt when the cache gets written from three code paths, only two of which know to update both copies. Over months the cached column drifts from the source of truth. The user-facing bugs that result (the customer's name on the invoice is wrong, the dashboard total disagrees with the detail view) are notoriously hard to reproduce because the inconsistency is data-state-dependent.

The remediation cost depends on the volume of drifted rows. A reconciliation job to detect and repair the divergence is typically 40 to 120 engineer-hours to write, plus ongoing run cost. The harder cost is the architectural change to prevent recurrence: centralising the writes through a single service interface, adding triggers, or moving to a materialised view. Each option has its own trade-offs. Materialised views in PostgreSQL are the cleanest pattern but require periodic refresh that scales poorly above a few million rows.

The lesson the case histories converge on: denormalise deliberately, for a measured benefit, with a single documented writer, and a periodic reconciliation job. Accidental denormalisation (someone copy-pasted a column from one table to another for a quick fix) is always debt.

Category 3: Missing Referential Integrity

Foreign keys carry a small write-time cost in exchange for a guarantee that referenced rows exist. Tables designed without foreign keys (sometimes for performance, sometimes by oversight) accumulate orphaned rows: child rows whose parents have been deleted, references to tenants that no longer exist, log rows pointing at users who were purged years ago. The orphaned rows do not break anything immediately. They surface in two ways: queries that join through them return surprising NULLs, and analytical queries over-count or under-count depending on the join direction.

The fix-up cost has three components. First, a one-time audit and cleanup of existing orphans. For a medium-sized OLTP database (a dozen tables, low billions of rows total), this is typically 80 to 200 engineer-hours and runs in 2 to 6 weeks because cleanup batches have to be small enough to not impact production. Second, the schema migration to add the constraint, which uses the same online-schema-change tooling as any DDL change on a busy table. Third, fixing any application code paths that previously tolerated orphans and now fail constraint checks. The third component is the most variable, ranging from a handful of bug fixes to a substantial app refactor.

Category 4: Outgrown Access Pattern

The most expensive schema-debt category and the hardest to pay down. A table designed for a low-volume access pattern (a single-tenant orders table, a global activity log, a single-region inventory table) that has grown into a million-write-per-day workload at high concurrency. The original design did not anticipate the load. Common symptoms: lock contention on a frequently-updated row, lock-table exhaustion on bulk operations, replica lag growing without bound, individual transactions exceeding the database's transaction-size limits.

The remediation options are all expensive. Partitioning the table (range-partitioning by date for log tables, hash-partitioning by tenant for multi-tenant tables) is usually the lowest-cost path because the schema change can be applied without touching application code in many cases. Sharding (splitting the table across multiple physical databases) is the highest-cost path because it requires application changes, cross-shard query rewrites, and a long migration window. The third option is offloading to a different storage system (moving a high-write log to Cassandra or DynamoDB while keeping the OLTP table for low-volume reads). All three options are typically multi-quarter projects with infrastructure cost increases of 50 to 200 percent.

Online Schema Change Cost Reference

For MySQL, the two industry-standard tools are gh-ost (GitHub's replication-aware migration tool) and pt-online-schema-change (Percona's trigger-based tool). For PostgreSQL, pg_repack and the built-in concurrent index operations cover most cases. All three are free, open-source, and battle-tested.

Table Size	gh-ost or pt-osc Duration	Engineer Attention Hours	Extra Disk Required
10 GB	30 min to 2 hrs	2 to 4 hrs	10 GB
100 GB	6 to 18 hrs	8 to 12 hrs	100 GB
500 GB	36 to 96 hrs	24 to 40 hrs	500 GB
2 TB	5 to 14 days	60 to 100 hrs	2 TB

The duration scales roughly linearly with row count and is bounded by replication-lag thresholds (gh-ost throttles when replicas fall behind). The engineer-attention hours include planning, dry-run on a staging replica, execution monitoring, and the cleanup. For tables above 1 TB the migration typically runs across multiple days with engineer check-ins twice a day rather than continuous monitoring.

Decision Heuristics

A few rules of thumb that hold across the case histories:

Index addition on a table under 100 GB is nearly always worth doing immediately. The fix cost is small, the carrying cost is real.
Normalisation repairs are worth doing only when the divergence is causing user-visible bugs. Otherwise the reconciliation job is cheaper than the schema migration.
Adding foreign keys retroactively is worth doing when the same data is also being read by analytical queries or by a downstream system. Otherwise the orphaned-row cleanup tax is the cheaper path.
Re-architecting an outgrown table should wait until at least two of the four symptoms (contention, lag, transaction size, lock-table exhaustion) appear regularly. Pre-emptive re-architecture nearly always picks the wrong target.

Frequently Asked Questions

What is database schema debt?

Decisions in the data model that pay short-term simplicity at the cost of long-term operational pain. Missing indexes on columns that became filter predicates, denormalised tables that drift out of sync, missing foreign-key constraints, columns with wider types than necessary, and tables that have outgrown the access pattern they were originally designed for.

How much does a missing index cost?

On a 100M-row table, a missing index that forces a full scan turns a sub-millisecond lookup into a multi-second query. The user-facing cost is variable; the infrastructure cost is the extra database CPU and I/O. At AWS RDS db.r6g.4xlarge pricing of around $1.27 per hour, a missing index that adds 30 percent CPU to a busy primary costs roughly $3,300 per year in compute alone, before any user-experience or scaling-headroom consideration.

What does an online schema change cost?

For a MySQL table of 100 GB, expect 6 to 18 hours of engineer-monitored execution time using gh-ost or pt-online-schema-change. The tooling is open-source so direct license cost is zero; the cost is engineer attention during the migration window plus the temporary doubling of disk space for the shadow table. For PostgreSQL, pg_repack works similarly with similar cost structure.

How do you measure schema debt?

Five queries against any modern database tell you most of what you need: slow-query log analysis for queries doing full scans, pg_stat_user_indexes for unused indexes, information_schema constraints for missing foreign keys, table_size against row count for over-wide rows, and a simple check for columns whose declared type is wider than their actual data needs.

What is the cost of missing foreign key constraints?

Two costs: orphaned rows that accumulate over time and require periodic cleanup jobs, and the application-level integrity checks that have to be written and maintained because the database is not enforcing them. The cleanup jobs are typically 50 to 200 engineer-hours per year. The application-level checks are scattered across the codebase and are a frequent source of subtle bugs.

Should I always normalise to 3NF?

No. Read-heavy analytical workloads, denormalised reporting tables, and event-sourced systems all have legitimate reasons to skip strict 3NF. The schema-debt question is whether the denormalisation was a deliberate choice for a measured benefit (read performance, snapshot integrity) or an accidental drift from the original model. The first is design. The second is debt.