When Rows Don’t Die: MVCC, Index Bloat & How PostgreSQL Stores Your Data
Posted at 15-July-2025 / Written by Rohit Bhatt

30-sec summary
PostgreSQL doesn’t just update rows — it leaves the old ones lying around like forgotten leftovers. It’s called MVCC, and while it’s great for concurrency, it can make your indexes bloated and your queries slow. This blog walks through what really happens inside your DB, how to spot bloat, and what to do about it.
Let’s Clear the Air: Updates Don’t Overwrite Rows
Here’s a surprise I wish someone had told me early: when you run an `UPDATE` or `DELETE` in PostgreSQL, the row doesn’t go away. Postgres just marks it as 'dead' and leaves it there. Why? Because of something called MVCC (Multi-Version Concurrency Control), which is how Postgres handles simultaneous reads and writes without locking everything up like a traffic jam.
Inside PostgreSQL: Pages, Tuples, and Metadata
Postgres stores your data in 8KB pages. These pages live inside heap files, and inside each page are rows — or 'tuples' as Postgres likes to call them. Every tuple has a header attached with hidden columns like:
- 1.xmin: Transaction ID that created the row.
- 2.xmax: Transaction ID that deleted or updated it.
- 3.ctid: Physical location of the row (page number + offset).
- 4.null bitmap: Optimized way to keep track of NULL columns without wasting space.
MVCC: Why PostgreSQL is So Dang Concurrent
MVCC lets Postgres show different versions of the same row to different users. When you update a row, Postgres inserts a brand-new version and marks the old one as dead. That way, if another transaction started before your update, it still sees the old row. It's genius — until you realize those dead rows don’t vanish automatically.
Where the Trash Piles Up: Dead Tuples
Postgres leaves dead rows lying around in both the heap and the indexes. That’s like deleting a file from your computer but it still shows up in your storage — until something (or someone) cleans it up. Over time, you end up with a lot of invisible junk. This makes queries slower and inflates your database size.
Meet VACUUM: PostgreSQL’s Cleanup Crew
VACUUM is the janitor that Postgres sends in to clear out dead tuples. It figures out which rows are no longer needed and marks that space as reusable. But here’s the kicker: regular VACUUM doesn’t make the file smaller. That requires VACUUM FULL, which rewrites the entire table and locks it while doing so. So:
- 1.VACUUM: Cleans the mess quietly, no locking, but doesn’t shrink the file.
- 2.VACUUM FULL: Actually frees disk space — but blocks access during cleanup.
The Index Story: Fast Until It’s Not
You add indexes to speed up lookups, right? Great! But here’s the trap — indexes store pointers to specific row versions. So every time you update a row, Postgres adds a new pointer and keeps the old one hanging around. It doesn’t remove that old pointer until you rebuild the index. Over time, the index grows and grows… even if your data doesn’t.
This Is Called Index Bloat
Dead pointers in your indexes make them fat and slow. You still have to scan through them, even if half the entries point to dead rows. VACUUM won’t remove them — it just marks them as unusable. To fix that, you need REINDEX.
Let’s See Bloat in Action
This creates 1 million dead rows and 1 million dead index entries — just from a single update. That’s how easy it is to bloat your DB without realizing it.
1CREATE TABLE users(id SERIAL, name TEXT);
2CREATE INDEX idx_name ON users(name);
3
4INSERT INTO users(name)
5SELECT 'alice' FROM generate_series(1, 1000000);
6
7UPDATE users SET name = 'bob' WHERE name = 'alice';
How to Fix and Avoid Index Bloat
- 1.Make Sure Autovacuum Is Doing Its Job: Postgres runs autovacuum automatically — but you can (and should) tune the settings for high-write tables.
- 2.Use REINDEX CONCURRENTLY: Rebuild indexes in the background without locking the table. Super useful for production systems.
- 3.Don’t Update Unless You Have To: Even `UPDATE users SET status = 'active' WHERE status = 'active'` creates a new version. Avoid it.
- 4.Use fillfactor on indexes: Leave space in index pages for future updates. Example: `CREATE INDEX idx_name ON users(name) WITH (fillfactor = 80);`
How to Monitor Index Size and Bloat
You can use built-in views to keep an eye on your indexes:
1SELECT
2 relname AS index_name,
3 pg_size_pretty(pg_relation_size(indexrelid)) AS size
4FROM
5 pg_stat_user_indexes
6ORDER BY
7 pg_relation_size(indexrelid) DESC;
Want more detailed bloat info? Install the `pgstattuple` extension — it gives you stats like how much of your index is dead weight.
Wrap-Up: Your Database Has a Body — Keep It Clean
PostgreSQL is amazing, but it won’t clean up after itself unless you ask. MVCC gives you concurrency, but it leaves behind clutter. Indexes make reads fast, but without a little care, they become junk drawers. Schedule regular VACUUMs, monitor index sizes, use REINDEX, and don’t do unnecessary writes. Your future self (and your query planner) will thank you.