AI Bot Blocker (robots.txt)

Choose which AI crawlers to block and get the exact robots.txt rules. Training crawlers and AI search bots are listed separately, because blocking them has very different consequences for your traffic.

The trade-off in one line: blocking AI training crawlers protects your content from model training at no traffic cost, but blocking AI search bots also removes you from AI answers and citations, which is traffic. Most sites should block training and allow AI search.

     

    What about a "noai" meta tag?

    There is no universal "noai" directive. A meta tag like <meta name="robots" content="noai"> is not a standard and major AI crawlers do not honor it, so this tool does not emit one. Robots.txt user-agent rules (above) are the working opt-out mechanism today. For non-HTML files you can mirror the same intent with an X-Robots-Tag header per bot, e.g.:
    
    X-Robots-Tag: GPTBot: noindex
    
    but support varies by bot, so treat robots.txt as the primary control and a WAF rule as the enforcement layer.

    The training vs. search trade-off

    AI crawlers are not one thing, and treating them as one thing is how sites lose traffic by accident. Training crawlers like GPTBot, ClaudeBot, and CCBot collect content to train future models. Blocking them costs you nothing measurable today: your pages still rank, still get cited, still appear in AI search results. AI search and answer bots like OAI-SearchBot, PerplexityBot, and Claude-SearchBot are different. They power the answers users actually see, with links back to your pages. Block those and you disappear from that engine's answers, the same way blocking Googlebot removes you from Google. That is why this tool pre-checks the training group and leaves the search group alone: block training, allow AI search is the sane default for most sites. Deviate only when you have a specific reason, such as paywalled content you do not want summarized anywhere.

    What robots.txt can and cannot do

    Robots.txt is a published request, not an access control. The major labs honor it, and for them the rules this tool generates are sufficient. Bots that ignore it, Bytespider being the best-documented offender, need enforcement at the network layer: a WAF or CDN rule that matches the user agent and known IP ranges and refuses the request. Cloudflare offers a one-toggle AI bot block that does exactly this. The practical setup is both layers: robots.txt as the formal opt-out that compliant bots respect, and the firewall as the wall the rest run into.

    Keep classic search crawlers allowed

    Googlebot and Bingbot are listed here only as a guardrail: this tool will not write blocking rules for them. Disallowing classic search crawlers removes your site from normal search results, which for almost every site is the single most expensive robots.txt mistake possible. If you want to limit what Google does with your content in Gemini, use the Google-Extended token instead, which controls AI training use without touching Search.

    Frequently asked questions

    Does robots.txt actually stop AI bots?

    Compliant ones, yes. GPTBot, ClaudeBot, CCBot, and the other crawlers from major labs publicly state they honor robots.txt, and observed behavior matches. Rogue or poorly behaved bots, no: robots.txt is a request, not a wall. For enforcement you need WAF-level blocking by user agent and IP range, for example Cloudflare's one-click AI bot blocking, which sits in front of your server and actually refuses the requests.

    Should I block GPTBot?

    It depends on which trade you want to make. GPTBot collects content for training OpenAI's models; blocking it keeps your content out of future training runs but does not remove you from ChatGPT search, which uses OAI-SearchBot and ChatGPT-User instead. Most sites block training crawlers like GPTBot while leaving the search and user-fetch bots alone, keeping the citation traffic while opting out of training.

    What is Google-Extended?

    Google-Extended is not a separate crawler. It is a robots.txt token that tells Google not to use your content for Gemini training and grounding. Your pages are still crawled by regular Googlebot, and blocking Google-Extended has no effect on Google Search crawling, indexing, or rankings. It is purely an AI training opt-out.

    How do I verify the blocking works?

    Two checks. First, load yourdomain.com/robots.txt in a browser and confirm the new User-agent blocks are live, since a misplaced deploy is the most common failure. Second, watch your server logs or CDN analytics for the blocked user agent strings over the following weeks: compliant bots should stop requesting pages, and any that continue are candidates for a firewall rule.