OOM Killer Crashed MariaDB and Corrupted the Database — Recovering a WordPress Site from InnoDB Wreckage
· 15 min read
"Error Establishing a Database Connection" — But the Server Was Fine
A client's WooCommerce store went down at 3am on a Saturday. The monitoring alert said the site was returning a blank white page with the WordPress classic: "Error establishing a database connection." Nginx was running, PHP-FPM was running, the VPS itself was responsive. But MariaDB was dead.
I SSH'd in and tried to start it:
sudo systemctl start mariadb
It failed immediately. No error on the terminal — just a silent non-zero exit. The real story was in the journal:
sudo journalctl -u mariadb --since "3 hours ago" --no-pager
Two things jumped out. First, the OOM killer entry from 3:12am:
kernel: Out of memory: Killed process 14823 (mariadbd) total-vm:2847632kB, anon-rss:1689420kB, file-rss:0kB, shmem-rss:0kB, UID:27 pgtables:4612kB score:412
Then the failed restart attempt, where MariaDB tried to come back up and immediately crashed:
mariadbd: InnoDB: Page [page id: space=4, page number=287] log sequence number 28441927168 is in the future! Current system log sequence number 28439012864.
mariadbd: InnoDB: Database page corruption on disk or a failed file read of tablespace shop/wp_options page [page id: space=4, page number=287]
The OOM killer had terminated MariaDB in the middle of writing to disk. InnoDB's write-ahead log and the actual data files were now out of sync. The database was corrupted.
Why the OOM Killer Targeted MariaDB
Before touching the corruption, I needed to understand why the server ran out of memory in the first place. This was a 4GB VPS running:
- Nginx (roughly 50MB)
- PHP-FPM with
pm = dynamic,pm.max_children = 25, each worker averaging 60MB (up to 1,500MB) - MariaDB with
innodb_buffer_pool_size = 2G(2,048MB) - Redis (allocated 256MB)
Add those up: 50 + 1,500 + 2,048 + 256 = 3,854MB. On a 4GB (4,096MB) VPS. That left 242MB for the operating system, kernel buffers, any log processing, and everything else. With zero swap configured.
The maths had always been tight, but it worked on a normal day because PHP-FPM rarely hit all 25 workers simultaneously and MariaDB rarely filled the entire buffer pool. On this particular Friday evening, the store ran a promotional email campaign that drove a traffic spike. PHP-FPM scaled up to 22 concurrent workers. The kernel ran out of memory, and the OOM killer picked the process with the highest memory footprint — MariaDB, sitting at 1.6GB of resident memory.
No swap meant no safety buffer. The kernel went straight from "memory pressure" to "kill something" with no intermediate step.
Getting MariaDB Running Again
InnoDB is crash-safe by design. It uses a write-ahead log (the redo log) to replay incomplete transactions after a crash. But when the crash is violent enough — a SIGKILL from the OOM killer mid-flush — the redo log itself can be inconsistent with the tablespace files. Normal crash recovery fails, and MariaDB refuses to start.
The recovery tool for this situation is innodb_force_recovery. It tells MariaDB to skip parts of its normal consistency checks and start anyway, in a degraded read-only mode.
I edited the MariaDB configuration:
sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf
Added under [mysqld]:
innodb_force_recovery = 1
Started MariaDB:
sudo systemctl start mariadb
It came up. Level 1 (SRV_FORCE_IGNORE_CORRUPT) tells InnoDB to keep running even if it detects corrupt pages, rather than crashing. This is the gentlest recovery mode — it skips corrupt pages but still runs the redo log, still allows background threads, and still enforces most consistency checks.
At this level, you can read data and run mysqldump. Since MariaDB 10.2.7, write operations (INSERT, UPDATE, DELETE) are permitted at recovery levels 1 through 3, but I treat the database as read-only during recovery regardless — the goal is to get the data out via mysqldump, not to run the application against a damaged tablespace.
Assessing the Damage
First, I needed to know which tables were actually corrupt versus which were fine. I ran mysqlcheck across the WordPress database:
mysqlcheck -u root -p shop --check --extended
The output flagged three tables:
shop.wp_options
warning : InnoDB: The B-tree of index PRIMARY is corrupted.
error : Corrupt
shop.wp_wc_orders
warning : InnoDB: The B-tree of index PRIMARY is corrupted.
error : Corrupt
shop.wp_actionscheduler_actions
warning : InnoDB: The B-tree of index PRIMARY is corrupted.
error : Corrupt
wp_options — the most critical table in any WordPress installation. Every page load queries it. wp_wc_orders — the HPOS order storage table. And wp_actionscheduler_actions — WooCommerce's background job queue. All three were tables with high write frequency, which makes sense — they were most likely to have open transactions when the OOM killer struck.
The remaining 60+ tables checked clean.
The Dump-and-Restore Process
With innodb_force_recovery = 1, I could still read data from the corrupt tables. InnoDB would skip the damaged pages, which meant some rows might be missing from the dump, but the bulk of the data would survive.
I dumped the entire database:
mysqldump -u root -p --single-transaction --routines --triggers shop > /root/shop_recovery_$(date +%Y%m%d_%H%M%S).sql
The --single-transaction flag is important — it takes a consistent snapshot without locking tables, which matters because innodb_force_recovery mode does not support LOCK TABLES.
During the dump, MariaDB logged warnings about the three corrupt tables but completed the export. I checked the dump file for obvious problems:
tail -5 /root/shop_recovery_20260518_034500.sql
The file ended with -- Dump completed — good. An incomplete dump would have been truncated without that footer.
I also counted rows in the critical tables to compare later:
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_options;" 2>/dev/null
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_wc_orders;" 2>/dev/null
wp_options returned 4,847 rows. wp_wc_orders returned 31,206. I noted these numbers.
Next, I disabled force recovery and stopped MariaDB:
sudo sed -i '/innodb_force_recovery/d' /etc/mysql/mariadb.conf.d/50-server.cnf
sudo systemctl stop mariadb
Then the nuclear step — remove the corrupted tablespace files and let MariaDB rebuild from scratch:
sudo mkdir -p /root/recovery_backup
sudo mv /var/lib/mysql/ib_logfile0 /root/recovery_backup/
sudo mv /var/lib/mysql/ib_logfile1 /root/recovery_backup/
sudo mv /var/lib/mysql/ibdata1 /root/recovery_backup/
sudo mv /var/lib/mysql/shop/ /root/recovery_backup/
I moved them rather than deleting — never destroy evidence of a crash until you are certain the recovery succeeded.
Started MariaDB fresh:
sudo systemctl start mariadb
MariaDB initialised new InnoDB system tablespace files and started cleanly. I recreated the database and restored:
mysql -u root -p -e "CREATE DATABASE shop CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
mysql -u root -p shop < /root/shop_recovery_20260518_034500.sql
The restore took about 4 minutes for a 1.2GB dump file. I verified row counts:
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_options;"
mysql -u root -p -e "SELECT COUNT(*) FROM shop.wp_wc_orders;"
wp_options: 4,831 rows — 16 rows lost from the corrupt pages. Most likely transients and autoloaded cache entries that WordPress would regenerate on the next page load. wp_wc_orders: 31,206 — all orders intact. The corruption in that table had hit an index page rather than a data page, so the data survived even though the index was damaged.
Bringing the Site Back
With the database restored, I tested WordPress:
wp option get siteurl --path=/home/store/htdocs
It returned the correct URL. I loaded the site in a browser — homepage rendered, WooCommerce products appeared, the cart worked. One quick check for the client's peace of mind:
wp wc shop_order list --status=processing --format=count --path=/home/store/htdocs
All pending orders were accounted for. The store was back.
Total downtime: 47 minutes from the OOM kill to the site serving pages again. Most of that was the dump and restore — the actual diagnosis took under 5 minutes once I saw the journal output.
Fixing the Root Cause — Memory Right-Sizing
Getting the site back was the emergency. Preventing it from happening again was the real job. The server's memory allocation was fundamentally broken — 4GB of RAM with services configured to consume 3.85GB leaves no margin.
1. Right-sized innodb_buffer_pool_size
The original 2GB buffer pool was wildly oversized for this workload. I checked the actual working set:
SELECT
ROUND(SUM(data_length + index_length) / 1024 / 1024, 1) AS total_mb
FROM information_schema.tables
WHERE table_schema = 'shop';
The entire database was 380MB on disk. Allocating 2GB of buffer pool for a 380MB dataset was wasting 1.6GB of RAM on empty cache space. I reduced it to 512MB — enough to hold the entire dataset in memory with headroom for growth:
[mysqld]
innodb_buffer_pool_size = 512M
2. Right-sized PHP-FPM workers
Twenty-five max children at 60MB each was the other half of the problem. The formula I use:
max_children = (Available RAM - MariaDB - Redis - OS overhead) / average worker size
With the new buffer pool: (4096 - 512 - 256 - 512) / 60 = 46 workers theoretically, but I set it to 15 to leave genuine headroom rather than running at the knife-edge again:
pm = dynamic
pm.max_children = 15
pm.start_servers = 4
pm.min_spare_servers = 2
pm.max_spare_servers = 6
Fifteen workers at 60MB = 900MB. Total committed memory: 512 (MariaDB) + 900 (PHP-FPM) + 256 (Redis) + 50 (Nginx) + 512 (OS) = 2,230MB. On a 4GB VPS, that leaves 1.8GB of genuine breathing room.
3. Added swap as a safety buffer
A VPS with zero swap is a server with no parachute. I added a 2GB swapfile:
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
And tuned vm.swappiness low so the kernel only uses swap under genuine pressure, not as a general-purpose extension of RAM:
echo 'vm.swappiness = 10' | sudo tee -a /etc/sysctl.d/99-swap.conf
sudo sysctl -p /etc/sysctl.d/99-swap.conf
Swap is not a performance tool — it is a crash-prevention tool. A VPS that swaps under load will be slow. A VPS that runs out of memory without swap will have its database killed and potentially corrupted. Slow is better than corrupt.
4. Protected MariaDB from the OOM killer
Even with correct memory sizing, traffic spikes can push memory usage beyond predictions. I configured systemd to lower MariaDB's OOM score so the kernel kills PHP-FPM workers (which are stateless and recoverable) before touching the database (which is not):
sudo systemctl edit mariadb
Added:
[Service]
OOMScoreAdjust=-500
This does not make MariaDB immune to the OOM killer — that would be dangerous, because if MariaDB is genuinely leaking memory, making it unkillable would crash the entire server. It just moves MariaDB to the back of the queue, behind PHP-FPM workers that can be safely killed and restarted.
Monitoring to Catch Memory Pressure Early
I added a simple check to the server's monitoring crontab:
#!/bin/bash
FREE_MB=$(free -m | awk '/^Mem:/ {print $7}')
SWAP_USED=$(free -m | awk '/^Swap:/ {print $3}')
if [ "$FREE_MB" -lt 200 ]; then
echo "Low memory alert: ${FREE_MB}MB available" | mail -s "Memory Alert - $(hostname)" [email protected]
fi
if [ "$SWAP_USED" -gt 100 ]; then
echo "Swap usage alert: ${SWAP_USED}MB in use" | mail -s "Swap Alert - $(hostname)" [email protected]
fi
Any swap usage above 100MB means the server is under memory pressure — not an emergency, but an early warning that the memory budget needs revisiting before the OOM killer gets involved.
I also added MariaDB uptime monitoring via WP-CLI:
wp db query "SHOW GLOBAL STATUS LIKE 'Uptime';" --path=/home/store/htdocs
If MariaDB's uptime is lower than the server's uptime, it has been restarted — and unplanned restarts need investigation.
What innodb_force_recovery Levels Actually Do
For reference, because you will need this at 3am and won't want to read documentation:
| Level | What it does | Safe to dump? |
|---|---|---|
| 1 | Ignores corrupt pages, continues running | Yes |
| 2 | Prevents the purge thread from running | Yes |
| 3 | Skips transaction rollback after crash recovery | Yes |
| 4 | Prevents insert buffer merge operations | Mostly — some data may be stale |
| 5 | Skips undo log processing on startup | Risky — data integrity not guaranteed |
| 6 | Skips redo log roll-forward on startup | Last resort — expect data loss |
Always start at 1. Only increase if MariaDB will not start at the current level. If you reach 4 or above, your dump will likely contain inconsistencies — compare row counts, check recent orders manually, and verify critical data before trusting the restore.
The Takeaway
InnoDB is remarkably resilient. Normal server crashes — a systemctl restart, a clean shutdown, even a kernel panic with proper journaling — almost never cause data corruption. InnoDB's write-ahead log handles all of that.
But the OOM killer is different. It sends SIGKILL, which cannot be caught or handled. There is no graceful shutdown, no final flush, no checkpoint. If MariaDB is in the middle of writing a page to disk — which, on a busy WooCommerce store, it almost always is — the page gets truncated. The redo log says one thing, the tablespace file says another, and InnoDB refuses to reconcile them.
The fix is not better crash recovery. The fix is never letting it happen. Right-size your memory allocation so MariaDB and PHP-FPM are not fighting over the same RAM. Add swap as a safety valve. Protect MariaDB's OOM score. Monitor memory pressure before it becomes memory exhaustion.
Stop Firefighting. Start Maintaining.
I manage 70+ WordPress sites for agencies and businesses. Whether you need ongoing maintenance, emergency support, or a one-off performance fix — I can help.
If your VPS is running WordPress without proper memory tuning or monitoring, this scenario is a matter of when, not if. Check out the maintenance plans, or read about how I handled a WooCommerce store where MariaDB ran out of connections under a different kind of database pressure.
