OOM Killer Crashed MariaDB and Corrupted the Database — Recovering a WordPress Site from InnoDB Wreckage

· 15 min read

"Error Establishing a Database Connection" — But the Server Was Fine

A client's WooCommerce store went down at 3am on a Saturday. The monitoring alert said the site was returning a blank white page with the WordPress classic: "Error establishing a database connection." Nginx was running, PHP-FPM was running, the VPS itself was responsive. But MariaDB was dead.

I SSH'd in and tried to start it:

sudo systemctl start mariadb

It failed immediately. No error on the terminal — just a silent non-zero exit. The real story was in the journal:

sudo journalctl -u mariadb --since "3 hours ago" --no-pager

Two things jumped out. First, the OOM killer entry from 3:12am:

kernel: Out of memory: Killed process 14823 (mariadbd) total-vm:2847632kB, anon-rss:1689420kB, file-rss:0kB, shmem-rss:0kB, UID:27 pgtables:4612kB score:412

Then the failed restart attempt, where MariaDB tried to come back up and immediately crashed:

mariadbd: InnoDB: Page [page id: space=4, page number=287] log sequence number 28441927168 is in the future! Current system log sequence number 28439012864.
mariadbd: InnoDB: Database page corruption on disk or a failed file read of tablespace shop/wp_options page [page id: space=4, page number=287]

The OOM killer had terminated MariaDB in the middle of writing to disk. InnoDB's write-ahead log and the actual data files were now out of sync. The database was corrupted.

Why the OOM Killer Targeted MariaDB

Before touching the corruption, I needed to understand why the server ran out of memory in the first place. This was a 4GB VPS running:

  • Nginx (roughly 50MB)
  • PHP-FPM with pm = dynamic, pm.max_children = 25, each worker averaging 60MB (up to 1,500MB)
  • MariaDB with innodb_buffer_pool_size = 2G (2,048MB)
  • Redis (allocated 256MB)

Add those up: 50 + 1,500 + 2,048 + 256 = 3,854MB. On a 4GB (4,096MB) VPS. That left 242MB for the operating system, kernel buffers, any log processing, and everything else. With zero swap configured.

The maths had always been tight, but it worked on a normal day because PHP-FPM rarely hit all 25 workers simultaneously and MariaDB rarely filled the entire buffer pool. On this particular Friday evening, the store ran a promotional email campaign that drove a traffic spike. PHP-FPM scaled up to 22 concurrent workers. The kernel ran out of memory, and the OOM killer picked the process with the highest memory footprint — MariaDB, sitting at 1.6GB of resident memory.

No swap meant no safety buffer. The kernel went straight from "memory pressure" to "kill something" with no intermediate step.

Getting MariaDB Running Again

InnoDB is crash-safe by design. It uses a write-ahead log (the redo log) to replay incomplete transactions after a crash. But when the crash is violent enough — a SIGKILL from the OOM killer mid-flush — the redo log itself can be inconsistent with the tablespace files. Normal crash recovery fails, and MariaDB refuses to start.

The recovery tool for this situation is innodb_force_recovery. It tells MariaDB to skip parts of its normal consistency checks and start anyway, in a degraded read-only mode.

I edited the MariaDB configuration:

sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf

Added under [mysqld]:

innodb_force_recovery = 1

Started MariaDB:

sudo systemctl start mariadb

It came up. Level 1 (SRV_FORCE_IGNORE_CORRUPT) tells InnoDB to keep running even if it detects corrupt pages, rather than crashing. This is the gentlest recovery mode — it skips corrupt pages but still runs the redo log, still allows background threads, and still enforces most consistency checks.

At this level, you can read data and run mysqldump. Since MariaDB 10.2.7, write operations (INSERT, UPDATE, DELETE) are permitted at recovery levels 1 through 3, but I treat the database as read-only during recovery regardless — the goal is to get the data out via mysqldump, not to run the application against a damaged tablespace.

Assessing the Damage

First, I needed to know which tables were actually corrupt versus which were fine. I ran mysqlcheck across the WordPress database:

mysqlcheck -u root -p shop --check --extended

The output flagged three tables:

shop.wp_options
warning  : InnoDB: The B-tree of index PRIMARY is corrupted.
error    : Corrupt

shop.wp_wc_orders
warning  : InnoDB: The B-tree of index PRIMARY is corrupted.
error    : Corrupt

shop.wp_actionscheduler_actions
warning  : InnoDB: The B-tree of index PRIMARY is corrupted.
error    : Corrupt

wp_options — the most critical table in any WordPress installation. Every page load queries it. wp_wc_orders — the HPOS order storage table. And wp_actionscheduler_actions — WooCommerce's background job queue. All three were tables with high write frequency, which makes sense — they were most likely to have open transactions when the OOM killer struck.

The remaining 60+ tables checked clean.

The Dump-and-Restore Process

With innodb_force_recovery = 1, I could still read data from the corrupt tables. InnoDB would skip the damaged pages, which meant some rows might be missing from the dump, but the bulk of the data would survive.

I dumped the entire database:

mysqldump -u root -p --single-transaction --routines --triggers shop > /root/shop_recovery_$(date +%Y%m%d_%H%M%S).sql

The --single-transaction flag is important — it takes a consistent snapshot without locking tables, which matters because innodb_force_recovery mode does not support LOCK TABLES.

During the dump, MariaDB logged warnings about the three corrupt tables but completed the export. I checked the dump file for obvious problems:

tail -5 /root/shop_recovery_20260518_034500.sql

The file ended with -- Dump completed — good. An incomplete dump would have been truncated without that footer.

I also counted rows in the critical tables to compare later:

mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_options;" 2>/dev/null
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_wc_orders;" 2>/dev/null

wp_options returned 4,847 rows. wp_wc_orders returned 31,206. I noted these numbers.

Next, I disabled force recovery and stopped MariaDB:

sudo sed -i '/innodb_force_recovery/d' /etc/mysql/mariadb.conf.d/50-server.cnf
sudo systemctl stop mariadb

Then the nuclear step — remove the corrupted tablespace files and let MariaDB rebuild from scratch:

sudo mkdir -p /root/recovery_backup
sudo mv /var/lib/mysql/ib_logfile0 /root/recovery_backup/
sudo mv /var/lib/mysql/ib_logfile1 /root/recovery_backup/
sudo mv /var/lib/mysql/ibdata1 /root/recovery_backup/
sudo mv /var/lib/mysql/shop/ /root/recovery_backup/

I moved them rather than deleting — never destroy evidence of a crash until you are certain the recovery succeeded.

Started MariaDB fresh:

sudo systemctl start mariadb

MariaDB initialised new InnoDB system tablespace files and started cleanly. I recreated the database and restored:

mysql -u root -p -e "CREATE DATABASE shop CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
mysql -u root -p shop < /root/shop_recovery_20260518_034500.sql

The restore took about 4 minutes for a 1.2GB dump file. I verified row counts:

mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_options;"
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_wc_orders;"

wp_options: 4,831 rows — 16 rows lost from the corrupt pages. Most likely transients and autoloaded cache entries that WordPress would regenerate on the next page load. wp_wc_orders: 31,206 — all orders intact. The corruption in that table had hit an index page rather than a data page, so the data survived even though the index was damaged.

Bringing the Site Back

With the database restored, I tested WordPress:

wp option get siteurl --path=/home/store/htdocs

It returned the correct URL. I loaded the site in a browser — homepage rendered, WooCommerce products appeared, the cart worked. One quick check for the client's peace of mind:

wp wc shop_order list --status=processing --format=count --path=/home/store/htdocs

All pending orders were accounted for. The store was back.

Total downtime: 47 minutes from the OOM kill to the site serving pages again. Most of that was the dump and restore — the actual diagnosis took under 5 minutes once I saw the journal output.

Fixing the Root Cause — Memory Right-Sizing

Getting the site back was the emergency. Preventing it from happening again was the real job. The server's memory allocation was fundamentally broken — 4GB of RAM with services configured to consume 3.85GB leaves no margin.

1. Right-sized innodb_buffer_pool_size

The original 2GB buffer pool was wildly oversized for this workload. I checked the actual working set:

SELECT
  ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS total_mb
FROM information_schema.tables
WHERE table_schema = 'shop';

The entire database was 380MB on disk. Allocating 2GB of buffer pool for a 380MB dataset was wasting 1.6GB of RAM on empty cache space. I reduced it to 512MB — enough to hold the entire dataset in memory with headroom for growth:

[mysqld]
innodb_buffer_pool_size = 512M

2. Right-sized PHP-FPM workers

Twenty-five max children at 60MB each was the other half of the problem. The formula I use:

max_children = (Available RAM - MariaDB - Redis - OS overhead) / average worker size

With the new buffer pool: (4096 - 512 - 256 - 512) / 60 = 46 workers theoretically, but I set it to 15 to leave genuine headroom rather than running at the knife-edge again:

pm = dynamic
pm.max_children = 15
pm.start_servers = 4
pm.min_spare_servers = 2
pm.max_spare_servers = 6

Fifteen workers at 60MB = 900MB. Total committed memory: 512 (MariaDB) + 900 (PHP-FPM) + 256 (Redis) + 50 (Nginx) + 512 (OS) = 2,230MB. On a 4GB VPS, that leaves 1.8GB of genuine breathing room.

3. Added swap as a safety buffer

A VPS with zero swap is a server with no parachute. I added a 2GB swapfile:

sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

And tuned vm.swappiness low so the kernel only uses swap under genuine pressure, not as a general-purpose extension of RAM:

echo 'vm.swappiness = 10' | sudo tee -a /etc/sysctl.d/99-swap.conf
sudo sysctl -p /etc/sysctl.d/99-swap.conf

Swap is not a performance tool — it is a crash-prevention tool. A VPS that swaps under load will be slow. A VPS that runs out of memory without swap will have its database killed and potentially corrupted. Slow is better than corrupt.

4. Protected MariaDB from the OOM killer

Even with correct memory sizing, traffic spikes can push memory usage beyond predictions. I configured systemd to lower MariaDB's OOM score so the kernel kills PHP-FPM workers (which are stateless and recoverable) before touching the database (which is not):

sudo systemctl edit mariadb

Added:

[Service]
OOMScoreAdjust=-500

This does not make MariaDB immune to the OOM killer — that would be dangerous, because if MariaDB is genuinely leaking memory, making it unkillable would crash the entire server. It just moves MariaDB to the back of the queue, behind PHP-FPM workers that can be safely killed and restarted.

Monitoring to Catch Memory Pressure Early

I added a simple check to the server's monitoring crontab:

#!/bin/bash
FREE_MB=$(free -m | awk '/^Mem:/ {print $7}')
SWAP_USED=$(free -m | awk '/^Swap:/ {print $3}')

if [ "$FREE_MB" -lt 200 ]; then
  echo "Low memory alert: ${FREE_MB}MB available" | mail -s "Memory Alert - $(hostname)" [email protected]
fi

if [ "$SWAP_USED" -gt 100 ]; then
  echo "Swap usage alert: ${SWAP_USED}MB in use" | mail -s "Swap Alert - $(hostname)" [email protected]
fi

Any swap usage above 100MB means the server is under memory pressure — not an emergency, but an early warning that the memory budget needs revisiting before the OOM killer gets involved.

I also added MariaDB uptime monitoring via WP-CLI:

wp db query "SHOW GLOBAL STATUS LIKE 'Uptime';" --path=/home/store/htdocs

If MariaDB's uptime is lower than the server's uptime, it has been restarted — and unplanned restarts need investigation.

What innodb_force_recovery Levels Actually Do

For reference, because you will need this at 3am and won't want to read documentation:

Level What it does Safe to dump?
1 Ignores corrupt pages, continues running Yes
2 Prevents the purge thread from running Yes
3 Skips transaction rollback after crash recovery Yes
4 Prevents insert buffer merge operations Mostly — some data may be stale
5 Skips undo log processing on startup Risky — data integrity not guaranteed
6 Skips redo log roll-forward on startup Last resort — expect data loss

Always start at 1. Only increase if MariaDB will not start at the current level. If you reach 4 or above, your dump will likely contain inconsistencies — compare row counts, check recent orders manually, and verify critical data before trusting the restore.

The Takeaway

InnoDB is remarkably resilient. Normal server crashes — a systemctl restart, a clean shutdown, even a kernel panic with proper journaling — almost never cause data corruption. InnoDB's write-ahead log handles all of that.

But the OOM killer is different. It sends SIGKILL, which cannot be caught or handled. There is no graceful shutdown, no final flush, no checkpoint. If MariaDB is in the middle of writing a page to disk — which, on a busy WooCommerce store, it almost always is — the page gets truncated. The redo log says one thing, the tablespace file says another, and InnoDB refuses to reconcile them.

The fix is not better crash recovery. The fix is never letting it happen. Right-size your memory allocation so MariaDB and PHP-FPM are not fighting over the same RAM. Add swap as a safety valve. Protect MariaDB's OOM score. Monitor memory pressure before it becomes memory exhaustion.


Stop Firefighting. Start Maintaining.

I manage 70+ WordPress sites for agencies and businesses. Whether you need ongoing maintenance, emergency support, or a one-off performance fix — I can help.

If your VPS is running WordPress without proper memory tuning or monitoring, this scenario is a matter of when, not if. Check out the maintenance plans, or read about how I handled a WooCommerce store where MariaDB ran out of connections under a different kind of database pressure.

Stop Firefighting. Start Maintaining.

I manage 70+ WordPress sites for agencies and businesses. Whether you need ongoing maintenance, emergency support, or a one-off performance fix — I can help.

View Maintenance Plans Get in Touch

Get in Touch to Discuss Your Needs