Block AI Crawlers in Robots.txt — GPTBot, ClaudeBot, Google-Extended (2026)
Up-to-date list of AI training crawlers and the exact robots.txt rules to block them. Stops your content from being used to train ChatGPT, Claude, Gemini and Common Crawl.
Robots.txt is the only standardized opt-out for most AI training crawlers today. The list of crawlers below is the set most commonly named in publishers' robots.txt files; each company publishes its own user-agent identifier in its docs.
OPEN GENERATOR →Block Every Major AI Crawler
Paste this into your robots.txt. Each crawler must be its own group — unlike Disallow rules, user-agent declarations don't merge.
# Block major AI training crawlers and search/answer bots User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Claude-User Disallow: / User-agent: Claude-SearchBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: PerplexityBot Disallow: / User-agent: CCBot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: Amazonbot Disallow: / User-agent: Diffbot Disallow: /
Stay Citable in ChatGPT and Claude
Some bots are user-triggered fetchers (a person asks ChatGPT or Claude a question, the assistant fetches your page to cite it) rather than training crawlers. If your goal is to opt out of training but stay visible in chat answers, leave the live-fetch user-agents allowed:
# Block training crawlers, but stay citable in ChatGPT and Claude answers. User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: anthropic-ai Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: / # These are user-triggered or search bots, NOT training scrapers — leave them # allowed so your pages can appear as cited sources in chat answers and search: # User-agent: ChatGPT-User (OpenAI, live fetch when a user asks ChatGPT to read a page) # User-agent: OAI-SearchBot (OpenAI, ChatGPT Search index) # User-agent: Claude-User (Anthropic, live fetch when a user asks Claude) # User-agent: Claude-SearchBot (Anthropic, Claude search index) # User-agent: PerplexityBot (Perplexity answers index)
Crawler Reference Table
| User-agent | Owner | Purpose |
|---|---|---|
GPTBot | OpenAI | Training data for GPT models |
ChatGPT-User | OpenAI | Live fetches when ChatGPT reads a page in response to a user prompt |
OAI-SearchBot | OpenAI | Indexing content for ChatGPT Search citations |
ClaudeBot | Anthropic | Training data for Claude |
Claude-User | Anthropic | Live fetches when a user asks Claude a question that requires reading a page |
Claude-SearchBot | Anthropic | Indexing content for Claude search results |
anthropic-ai | Anthropic | Deprecated training crawler identifier (legacy — still seen in older publisher robots.txt files) |
Google-Extended | Opt-out for Gemini and Vertex AI training (does NOT affect Google Search rankings) | |
PerplexityBot | Perplexity AI | Indexing for Perplexity answers |
CCBot | Common Crawl | Open web dataset used as a training corpus by many downstream models |
Applebot-Extended | Apple | Opt-out for Apple Intelligence training (separate from Applebot, which powers Siri/Spotlight) |
Bytespider | ByteDance | Training crawler attributed to ByteDance |
Amazonbot | Amazon | Indexing for Alexa and Amazon assistant features |
Diffbot | Diffbot | Knowledge graph extraction frequently licensed by AI companies |
What Robots.txt Doesn't Do
- It's not enforcement. Robots.txt is an honor system. Reputable AI companies (OpenAI, Anthropic, Google, Apple, Perplexity) publicly commit to following it. Smaller scrapers and adversarial actors ignore it.
- It doesn't undo past training. Blocking GPTBot today does not remove your content from a model that was trained on a snapshot taken before the block. Most providers offer no formal removal mechanism for prior training data.
- It doesn't block Common Crawl downstream. Common Crawl makes its dataset public; many smaller models train on snapshots from years past. Blocking
CCBotstops future snapshots from including your site. - It doesn't affect search rankings. Each AI training crawler is a distinct user-agent from the search engine that powers rankings.
Googlebot≠Google-Extended; blocking the latter does not affect Google Search.
Beyond Robots.txt
- HTTP header: Add
X-Robots-Tag: noai, noimageaito responses for stronger signaling. Thenoaidirective is non-standard but recognized by some crawlers. - Meta tag:
<meta name="robots" content="noai, noimageai">in the HTML head conveys the same intent at the page level. - Cloudflare AI Audit / WAF: Cloudflare offers managed rules to block known AI bots at the network edge — useful when you want enforcement, not just signaling.
- Terms of Service: Update your ToS to forbid use of your content for AI training. Some publishers cite this in negotiations or DMCA disputes.
Verify Your Block Is Working
- Fetch
https://yourdomain.com/robots.txtwith a clean cache and confirm every user-agent block appears. - Check your server logs for hits from each user-agent. After the block, well-behaved crawlers should drop off within a week.
- Re-fetch in Google Search Console → Settings → robots.txt to make sure Google sees the latest version.
Block AI Crawlers FAQ
Does blocking Google-Extended hurt my Google Search rankings?
No. Google Search uses the Googlebot user-agent. Google-Extended is a separate identifier introduced specifically as an opt-out for Gemini and Vertex AI training. Blocking it has no effect on Google Search indexing or ranking.
What's the difference between GPTBot and ChatGPT-User?
GPTBot is OpenAI's training crawler — it fetches pages to build training data for future GPT models. ChatGPT-User fetches a page only when a user asks ChatGPT a question that requires browsing the web. If you want to opt out of training but still be cited as a source in ChatGPT answers, block GPTBot and leave ChatGPT-User allowed.
Will blocking AI crawlers stop my content from appearing in chat answers?
It depends which user-agents you block. Training crawlers (GPTBot, ClaudeBot, Google-Extended) only affect future training. User-triggered fetchers and search bots (ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, PerplexityBot) handle live answers and citations — block these and you disappear from those products' web answers.
Why are ClaudeBot and anthropic-ai both listed?
Anthropic's current production crawler is ClaudeBot. anthropic-ai is an older identifier still seen in publisher robots.txt files and used by some downstream scrapers. Listing both costs nothing and covers older traffic.
Does robots.txt remove my content from a model that's already trained?
No. Blocking a crawler today only prevents future training snapshots from including your content. If you need removal from an already-trained model, contact the provider directly — most do not offer a formal removal mechanism, but some honor opt-out requests.
Can I block AI crawlers without giving up SEO?
Yes. AI training crawlers are distinct user-agents from search engine crawlers. Blocking GPTBot, ClaudeBot, anthropic-ai, Google-Extended and CCBot opts you out of training without touching Googlebot, Bingbot, or any other search index.
RELATED TOOLS