AI Bots Are Hammering Your WordPress REST API — How I Cut Server Load by 60%
· 10 min read
A client's WooCommerce store had been getting progressively slower over a three-week period. Page loads crept from 1.6 seconds to 5+, and checkout timeouts were starting to affect conversions. The usual suspects — plugin updates, database bloat, PHP-FPM misconfiguration — were all clean. The answer was hiding in the access logs.
The Symptom
PHP-FPM workers were consistently saturated. On a server tuned for 20 workers, 18-19 were active during what should have been quiet mid-afternoon hours. But the site's analytics showed only 40-50 concurrent human visitors — nowhere near enough to consume that many workers.
curl -s http://127.0.0.1/status?full | grep -c "state: Running"
I had the PHP-FPM status page enabled on this server (listening on /status), so I could see active versus idle workers directly. Active workers were hovering at 18-19 regardless of real traffic. Something was consuming PHP resources that had nothing to do with actual customers.
The Investigation
I pulled the top request paths from the past hour:
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -20
The output was immediately telling:
4281 /wp-json/wp/v2/posts?per_page=100&page=1
3744 /wp-json/wp/v2/posts?per_page=100&page=2
2198 /wp-json/wp/v2/pages
1856 /wp-json/wp/v2/categories
1203 /wp-json/wp/v2/tags
947 /wp-json/wp/v2/users
891 /wp-json/wc/store/v1/products
342 /wp-json/wp/v2/comments
156 /
89 /shop/
Over 15,000 REST API requests per hour. The site only had around 120 posts and 30 pages — these bots were re-scraping the entire content catalogue in a loop.
I checked the user agents:
grep "wp-json" /var/log/nginx/access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -rn | head -10
5823 GPTBot/1.2 (+https://openai.com/gptbot)
3912 ClaudeBot/1.0 ([email protected])
2741 PetalBot; +https://webmaster.petalsearch.com/
1488 Bytespider; bytedance.com/bot
987 CCBot/2.0 (https://commoncrawl.org/faq/)
412 Mozilla/5.0 (compatible; DataForSeoBot/1.0)
203 python-requests/2.31.0
AI training crawlers, search crawlers, and generic scrapers — all hitting the REST API directly. Every single one of those requests bypassed the Nginx FastCGI page cache because /wp-json/ endpoints return dynamic JSON from PHP. Each request loaded WordPress core, ran the query, serialised the response, and released the worker. At 15,000+ per hour, that's 4+ uncached PHP requests per second on top of real traffic.
The /wp-json/wp/v2/users requests were also leaking usernames — a known enumeration vector I've written about separately. But the bigger problem here wasn't security. It was resource exhaustion.
Why REST API Abuse Is Worse Than Page Scraping
When a bot crawls your normal pages, Nginx can serve a cached HTML response in microseconds without touching PHP. But REST API endpoints are uncacheable by default — they return dynamic JSON, often with pagination parameters, and WordPress treats every request as a fresh PHP execution.
A single bot requesting /wp-json/wp/v2/posts?per_page=100 triggers a WP_Query that loads 100 posts from the database, serialises them into JSON with all their metadata, and returns a response that can be 200-500KB. Multiply that by several bots running in parallel, paginating through every endpoint, and you've got a silent DDoS that looks like legitimate API traffic.
The Fix: Three Layers
Layer 1: Nginx Rate Limiting for /wp-json/
The brute force post on this site covers rate limiting for wp-login.php. The same technique works for the REST API, but with a more generous limit since some legitimate frontend features (WooCommerce cart fragments, block editor, contact forms) use /wp-json/ on every page load.
Add to the http block in /etc/nginx/nginx.conf:
limit_req_zone $binary_remote_addr zone=wp_rest:10m rate=5r/s;
limit_req_status 429;
If your site sits behind Cloudflare or another reverse proxy, $binary_remote_addr will be the proxy's IP, not the visitor's. You need to restore the real client IP first — otherwise all traffic shares one rate-limit bucket:
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
# ... remaining Cloudflare ranges from https://www.cloudflare.com/ips/
real_ip_header CF-Connecting-IP;
Then in the site's server block:
location /wp-json/ {
limit_req zone=wp_rest burst=15 nodelay;
try_files $uri $uri/ /index.php?$args;
}
This allows 5 requests per second per IP, with a burst allowance of 15. That's plenty for a human browsing a WooCommerce store (cart updates, checkout API calls, block editor saves) but stops a bot from paginating through your entire content library at speed.
After the rate limit is exceeded, Nginx returns a 429 Too Many Requests without spawning a PHP process. The bot gets throttled; the server stays healthy.
Layer 2: Block Known AI Crawlers at Nginx
GPTBot and ClaudeBot technically respect robots.txt, but by the time they've parsed it and backed off, they've already consumed resources. Blocking at the Nginx level is instant and costs nothing:
map $http_user_agent $is_ai_crawler {
default 0;
"~*GPTBot" 1;
"~*ClaudeBot" 1;
"~*Bytespider" 1;
"~*PetalBot" 1;
"~*CCBot" 1;
"~*DataForSeoBot" 1;
"~*anthropic-ai" 1;
"~*Google-Extended" 1;
}
server {
# ... existing config ...
if ($is_ai_crawler) {
return 444;
}
}
The 444 response is Nginx-specific — it drops the connection immediately with no response body. Zero overhead.
I also updated robots.txt as a courtesy signal for bots that check it before crawling:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: Google-Extended
Disallow: /
This won't stop badly-behaved bots, but it does stop the legitimate ones from indexing your content for AI training — which is a reasonable default for a commercial WooCommerce store.
Layer 3: Restrict REST API at the WordPress Level
Even with Nginx rate limiting, there's no reason for unauthenticated visitors to access most REST API endpoints. Gutenberg and WP-Admin features run from authenticated sessions and will continue to work. But WooCommerce is different — guest checkout and the block-based Cart/Checkout pages call the Store API (/wp-json/wc/store/) without a logged-in session, so a blanket restriction will break checkout for guests. If you run WooCommerce, skip straight to the route-whitelisting version below.
Add to the theme's functions.php or a site-specific plugin:
add_filter( 'rest_authentication_errors', function ( $result ) {
if ( true === $result || is_wp_error( $result ) ) {
return $result;
}
if ( ! is_user_logged_in() ) {
return new WP_Error(
'rest_forbidden',
'REST API access restricted.',
array( 'status' => 403 )
);
}
return $result;
} );
One caveat: if your theme or a plugin uses the REST API on the public frontend — Contact Form 7, some headless setups, WooCommerce Store API for block-based checkout — this blanket restriction will break those features. For any WooCommerce store, use the route-whitelisting version instead:
add_filter( 'rest_authentication_errors', function ( $result ) {
if ( true === $result || is_wp_error( $result ) ) {
return $result;
}
$allowed_routes = array(
'/wc/store/',
'/contact-form-7/',
);
$rest_route = $GLOBALS['wp']->query_vars['rest_route'] ?? '';
foreach ( $allowed_routes as $route ) {
if ( 0 === strpos( $rest_route, $route ) ) {
return $result;
}
}
if ( ! is_user_logged_in() ) {
return new WP_Error(
'rest_forbidden',
'REST API access restricted.',
array( 'status' => 403 )
);
}
return $result;
} );
This checks the parsed REST route path rather than the raw REQUEST_URI, so it can't be bypassed by stuffing a whitelisted substring into a query parameter. It keeps WooCommerce block checkout and contact forms functional while locking down everything else.
The Results
Within 24 hours of deploying all three layers:
- REST API requests dropped from 15,000/hour to under 200/hour (legitimate authenticated requests only)
- Active PHP-FPM workers during off-peak dropped from 18-19 to 4-6
- Average page load time returned to 1.7 seconds
- Checkout timeout errors stopped completely
- Server CPU load average dropped from 3.8 to 0.9
The AI crawlers moved on. The scrapers hit 429s and 444s and stopped retrying. The generic Python bots got 403s from WordPress and had nothing left to scrape.
What I Now Check on Every Server
REST API abuse has overtaken XML-RPC as the most common source of unexplained PHP-FPM saturation on the servers I manage. The REST API ships enabled and fully open by default on every WordPress installation, and most hosting providers don't rate-limit it.
If your server is running hot and you've already ruled out wp-login.php and xmlrpc.php, grep your access logs for wp-json. You might be surprised by what you find.
This is part of the standard hardening I apply during server management onboarding and security monitoring setup. If your WooCommerce store is slow and you're not sure why, have a look at my maintenance plans.
