How a Cache Stampede Took Down a WooCommerce Store During a Flash Sale
· 12 min read
1,200 Orders an Hour — Then Nothing
A client running a WooCommerce store on a 4-core VPS messaged me mid-afternoon on a Thursday. They'd launched a 48-hour flash sale that morning. For the first three hours, things were fine — the store was processing around 1,200 orders per hour, comfortably within the server's capacity. Redis object cache was enabled, PHP-FPM had 40 workers, MariaDB was tuned for the workload. I'd set it all up six months earlier.
Then at 14:12, response times went from 180ms to 9 seconds. Checkout started throwing 504 errors. The client was watching orders flatline in real time.
I SSH'd in expecting the usual suspects — PHP-FPM worker exhaustion, a runaway cron job, maybe a plugin misbehaving under load. What I found was subtler and more destructive: a cache stampede triggered by a single update_option() call that was invalidating the WordPress object cache on every request.
The First Clue: MariaDB Was the Bottleneck
PHP-FPM status showed all 40 workers active, but none were stuck in a loop. They were all waiting:
SCRIPT STATE
/index.php Reading headers (waiting for upstream)
/index.php Reading headers (waiting for upstream)
/?wc-ajax=checkout Reading headers (waiting for upstream)
"Waiting for upstream" means PHP-FPM was waiting on MariaDB. I checked the process list:
SHOW FULL PROCESSLIST;
Thirty-seven connections were running the same query:
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';
That's the alloptions query — the one WordPress runs on every single request to load all autoloaded options into the object cache. When Redis is working properly, this query runs once, gets cached, and doesn't hit the database again until something invalidates it. Seeing 37 concurrent copies of it meant the cache was being blown away on nearly every request.
Understanding the Stampede
WordPress stores all autoloaded options in a single Redis key: wp_:options:alloptions. When any code calls update_option() on an autoloaded option, WordPress updates the cached array and writes it back. That sounds harmless — but under concurrent load, the timing creates a race condition. If process A updates the cached array while processes B through Z are mid-request with the old version, and a cache eviction or serialisation conflict occurs, those processes see a cache miss and all fire the same SELECT to rebuild it.
On a quiet site, this is invisible — one extra query every few minutes is nothing. But under heavy traffic, the timing matters. If the cache is invalidated while 40 PHP-FPM workers are handling concurrent requests, all 40 simultaneously discover the cache miss and all 40 fire the same SELECT query at MariaDB. That's the stampede.
On this store, the alloptions data was 3.2MB — 2,847 rows of autoloaded options accumulated over two years of plugin installations. Each of those 37 concurrent queries was a 3.2MB read from disk. MariaDB's InnoDB buffer pool was already under pressure from WooCommerce order writes, and the stampede pushed it past the tipping point.
Finding the Trigger
The question was: what kept invalidating the cache? I needed to find which update_option() call was firing during normal traffic.
I added a temporary mu-plugin to log every autoloaded option update:
<?php
add_action('updated_option', function ($option, $old, $new) {
if (wp_load_alloptions() && isset(wp_load_alloptions()[$option])) {
error_log("[alloptions-invalidation] {$option} updated by " . wp_debug_backtrace_summary());
}
}, 10, 3);
Within 60 seconds, the debug log filled with entries:
[alloptions-invalidation] _transient_wc_count_comments updated by ...
[alloptions-invalidation] woocommerce_tracker_last_send updated by ...
[alloptions-invalidation] cron updated by ...
[alloptions-invalidation] _transient_wc_count_comments updated by ...
Three culprits were hammering autoloaded options on every request:
-
_transient_wc_count_comments— WooCommerce recalculates comment counts for order notes and stores the result as an autoloaded transient. On a store processing this many orders, new order notes were being created constantly, invalidating this transient on nearly every request. -
woocommerce_tracker_last_send— WooCommerce's usage tracking writes a timestamp towp_optionswithautoload = yes. Even with tracking opted out, the check-and-update logic was firing. -
cron— WordPress stores the entire cron schedule inwp_optionsas an autoloaded option. Every time a scheduled event fires or is registered, this row gets updated, which blows the entirealloptionscache.
Under normal traffic, these updates might fire a few times per minute — annoying but survivable. During the flash sale, with hundreds of concurrent requests, each invalidation triggered a stampede of 20-40 identical database queries.
The Fix: Three Layers
Layer 1: Stop the unnecessary invalidations
First, I stopped the bleeding. The cron option being autoloaded is a known WordPress core design issue (Trac #31245). I switched it to non-autoloaded:
UPDATE wp_options SET autoload = 'no' WHERE option_name = 'cron';
WordPress will still load it when needed — it just won't be part of the alloptions cache key, so updating it won't invalidate everything else.
For the WooCommerce tracker, I disabled it explicitly:
add_filter('woocommerce_allow_tracking', '__return_false');
add_filter('woocommerce_tracker_last_send_time', function () {
return time();
});
For the comment count transient, WooCommerce should store this in the object cache rather than in wp_options. I added a filter to skip the option-based storage:
add_filter('pre_update_option__transient_wc_count_comments', function ($value, $old_value) {
set_transient('wc_count_comments_cached', $value, 3600);
return $old_value;
}, 10, 2);
By returning the old value, update_option() sees no change and skips the database write entirely. The fresh count is stored in a separate transient key, so it doesn't trigger an alloptions invalidation.
Layer 2: Shrink the alloptions payload
The 3.2MB alloptions payload was the multiplier that made each stampede so expensive. I audited what was actually in there:
SELECT option_name, LENGTH(option_value) AS size
FROM wp_options
WHERE autoload IN ('yes', 'on', 'auto', 'auto-on')
ORDER BY size DESC
LIMIT 20;
The top offenders:
| option_name | size |
|---|---|
_site_transient_update_plugins |
487,291 |
rewrite_rules |
312,847 |
cron |
298,102 |
_transient_wc_attribute_taxonomies |
89,441 |
litespeed.conf |
67,229 |
Half a megabyte in a single transient row. I switched the biggest offenders to autoload = no:
UPDATE wp_options SET autoload = 'no'
WHERE option_name IN (
'_site_transient_update_plugins',
'_site_transient_update_themes',
'rewrite_rules',
'cron'
);
rewrite_rules is safe to un-autoload — WordPress calls get_option('rewrite_rules') once during URL routing, and with Redis, that single lookup is cached. The site transients are only needed in the admin dashboard, not on the frontend.
After the cleanup, the alloptions payload dropped from 3.2MB to 1.1MB. Each cache miss now cost a third of the previous I/O.
Layer 3: Implement stampede protection
The real fix was preventing the stampede itself. Even with fewer invalidations, some are unavoidable — every legitimate update_option() on an autoloaded option triggers one.
I added a cache lock using Redis's atomic SET NX operation:
add_action('muplugins_loaded', function () {
if (!wp_using_ext_object_cache()) return;
add_filter('pre_wp_load_alloptions', function ($pre) {
$lock_key = 'alloptions_lock';
$cached = wp_cache_get('alloptions', 'options');
if ($cached !== false) return $cached;
$redis = new Redis();
$redis->connect('127.0.0.1', 6379);
$acquired = $redis->set($lock_key, getmypid(), ['NX', 'EX' => 5]);
if (!$acquired) {
for ($i = 0; $i < 5; $i++) {
usleep(50000); // 50ms
$cached = wp_cache_get('alloptions', 'options');
if ($cached !== false) return $cached;
}
}
return $pre;
});
});
The pre_wp_load_alloptions filter (available since WordPress 6.3) hooks into wp_load_alloptions(), which is the actual function that queries and caches all autoloaded options. When a cache miss occurs, the first process to arrive acquires a 5-second lock and proceeds to query the database. Subsequent processes retry up to five times at 50ms intervals — by which time the first process has usually completed the query and repopulated the cache. If the lock holder crashes, the 5-second TTL ensures the lock is automatically released.
This turned 37 concurrent identical queries into one query plus 36 brief waits. MariaDB's load dropped immediately.
The Result
After deploying all three layers, I watched the next traffic peak. The flash sale was still running, traffic was comparable to when it crashed:
- MariaDB connections during peak: 8-12 (down from 37+)
- Average response time: 210ms (down from 9 seconds)
alloptionsquery frequency: once every 15-30 seconds (down from dozens per second)- Zero 504 errors for the remaining 30 hours of the sale
How to Check If You're Vulnerable
If you're running Redis object cache on a WooCommerce store, check two things.
First, measure your alloptions payload:
SELECT SUM(LENGTH(option_value)) AS total_bytes
FROM wp_options
WHERE autoload IN ('yes', 'on', 'auto', 'auto-on');
If it's over 1MB, you have bloat that will amplify any stampede. Anything over 2MB is a ticking time bomb under traffic.
Second, check how often the alloptions key is being rebuilt. Run redis-cli MONITOR in one terminal and load your site in another:
redis-cli MONITOR | grep "alloptions"
If you see DEL commands on the alloptions key multiple times per second during normal traffic, something is writing to autoloaded options too frequently. Find it and stop it before your next traffic spike does it for you.
The Underlying Problem
This isn't a WordPress bug exactly — it's a design trade-off baked into the options API. Loading all autoloaded options in one query is efficient for the common case. But the invalidation strategy is all-or-nothing: update one option, lose the entire cache. WordPress core ticket #31245 has been open since 2015, proposing granular option caching. Until it lands, every site with Redis object cache is one noisy plugin away from a stampede.
The lesson is that caching doesn't just need to work — it needs to fail gracefully under concurrent load. A single-key cache that invalidates atomically is fine for a brochure site. For a WooCommerce store doing 1,200 orders an hour, it's an architectural hazard that needs active mitigation.
Stop Firefighting. Start Maintaining.
I manage 70+ WordPress sites for agencies and businesses. Whether you need ongoing maintenance, emergency support, or a one-off performance fix — I can help.
