Question 1

Does robots.txt actually stop AI bots?

Accepted Answer

Compliant ones, yes. GPTBot, ClaudeBot, CCBot, and the other crawlers from major labs publicly state they honor robots.txt, and observed behavior matches. Rogue or poorly behaved bots, no: robots.txt is a request, not a wall. For enforcement you need WAF-level blocking by user agent and IP range, for example Cloudflare's one-click AI bot blocking, which sits in front of your server and actually refuses the requests.

Question 2

Should I block GPTBot?

Accepted Answer

It depends on which trade you want to make. GPTBot collects content for training OpenAI's models; blocking it keeps your content out of future training runs but does not remove you from ChatGPT search, which uses OAI-SearchBot and ChatGPT-User instead. Most sites block training crawlers like GPTBot while leaving the search and user-fetch bots alone, keeping the citation traffic while opting out of training.

Question 3

What is Google-Extended?

Accepted Answer

Google-Extended is not a separate crawler. It is a robots.txt token that tells Google not to use your content for Gemini training and grounding. Your pages are still crawled by regular Googlebot, and blocking Google-Extended has no effect on Google Search crawling, indexing, or rankings. It is purely an AI training opt-out.

Question 4

How do I verify the blocking works?

Accepted Answer

Two checks. First, load yourdomain.com/robots.txt in a browser and confirm the new User-agent blocks are live, since a misplaced deploy is the most common failure. Second, watch your server logs or CDN analytics for the blocked user agent strings over the following weeks: compliant bots should stop requesting pages, and any that continue are candidates for a firewall rule.

AI Bot Blocker (robots.txt)

What about a "noai" meta tag?

The training vs. search trade-off

What robots.txt can and cannot do

Keep classic search crawlers allowed

Frequently asked questions