Block AI Crawlers in Robots.txt — GPTBot, ClaudeBot, Google-Extended (2026)

Up-to-date list of AI training crawlers and the exact robots.txt rules to block them. Stops your content from being used to train ChatGPT, Claude, Gemini and Common Crawl.

Robots.txt is the only standardized opt-out for most AI training crawlers today. The list of crawlers below is the set most commonly named in publishers' robots.txt files; each company publishes its own user-agent identifier in its docs.

OPEN GENERATOR →
ADVERTISEMENT

Block Every Major AI Crawler

Paste this into your robots.txt. Each crawler must be its own group — unlike Disallow rules, user-agent declarations don't merge.

# Block major AI training crawlers and search/answer bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

Stay Citable in ChatGPT and Claude

Some bots are user-triggered fetchers (a person asks ChatGPT or Claude a question, the assistant fetches your page to cite it) rather than training crawlers. If your goal is to opt out of training but stay visible in chat answers, leave the live-fetch user-agents allowed:

# Block training crawlers, but stay citable in ChatGPT and Claude answers.
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# These are user-triggered or search bots, NOT training scrapers — leave them
# allowed so your pages can appear as cited sources in chat answers and search:
# User-agent: ChatGPT-User       (OpenAI, live fetch when a user asks ChatGPT to read a page)
# User-agent: OAI-SearchBot      (OpenAI, ChatGPT Search index)
# User-agent: Claude-User        (Anthropic, live fetch when a user asks Claude)
# User-agent: Claude-SearchBot   (Anthropic, Claude search index)
# User-agent: PerplexityBot      (Perplexity answers index)

Crawler Reference Table

User-agent Owner Purpose
GPTBot OpenAI Training data for GPT models
ChatGPT-User OpenAI Live fetches when ChatGPT reads a page in response to a user prompt
OAI-SearchBot OpenAI Indexing content for ChatGPT Search citations
ClaudeBot Anthropic Training data for Claude
Claude-User Anthropic Live fetches when a user asks Claude a question that requires reading a page
Claude-SearchBot Anthropic Indexing content for Claude search results
anthropic-ai Anthropic Deprecated training crawler identifier (legacy — still seen in older publisher robots.txt files)
Google-Extended Google Opt-out for Gemini and Vertex AI training (does NOT affect Google Search rankings)
PerplexityBot Perplexity AI Indexing for Perplexity answers
CCBot Common Crawl Open web dataset used as a training corpus by many downstream models
Applebot-Extended Apple Opt-out for Apple Intelligence training (separate from Applebot, which powers Siri/Spotlight)
Bytespider ByteDance Training crawler attributed to ByteDance
Amazonbot Amazon Indexing for Alexa and Amazon assistant features
Diffbot Diffbot Knowledge graph extraction frequently licensed by AI companies

What Robots.txt Doesn't Do

  • It's not enforcement. Robots.txt is an honor system. Reputable AI companies (OpenAI, Anthropic, Google, Apple, Perplexity) publicly commit to following it. Smaller scrapers and adversarial actors ignore it.
  • It doesn't undo past training. Blocking GPTBot today does not remove your content from a model that was trained on a snapshot taken before the block. Most providers offer no formal removal mechanism for prior training data.
  • It doesn't block Common Crawl downstream. Common Crawl makes its dataset public; many smaller models train on snapshots from years past. Blocking CCBot stops future snapshots from including your site.
  • It doesn't affect search rankings. Each AI training crawler is a distinct user-agent from the search engine that powers rankings. GooglebotGoogle-Extended; blocking the latter does not affect Google Search.

Beyond Robots.txt

  • HTTP header: Add X-Robots-Tag: noai, noimageai to responses for stronger signaling. The noai directive is non-standard but recognized by some crawlers.
  • Meta tag: <meta name="robots" content="noai, noimageai"> in the HTML head conveys the same intent at the page level.
  • Cloudflare AI Audit / WAF: Cloudflare offers managed rules to block known AI bots at the network edge — useful when you want enforcement, not just signaling.
  • Terms of Service: Update your ToS to forbid use of your content for AI training. Some publishers cite this in negotiations or DMCA disputes.

Verify Your Block Is Working

  1. Fetch https://yourdomain.com/robots.txt with a clean cache and confirm every user-agent block appears.
  2. Check your server logs for hits from each user-agent. After the block, well-behaved crawlers should drop off within a week.
  3. Re-fetch in Google Search ConsoleSettings → robots.txt to make sure Google sees the latest version.

Block AI Crawlers FAQ

Does blocking Google-Extended hurt my Google Search rankings?

No. Google Search uses the Googlebot user-agent. Google-Extended is a separate identifier introduced specifically as an opt-out for Gemini and Vertex AI training. Blocking it has no effect on Google Search indexing or ranking.

What's the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's training crawler — it fetches pages to build training data for future GPT models. ChatGPT-User fetches a page only when a user asks ChatGPT a question that requires browsing the web. If you want to opt out of training but still be cited as a source in ChatGPT answers, block GPTBot and leave ChatGPT-User allowed.

Will blocking AI crawlers stop my content from appearing in chat answers?

It depends which user-agents you block. Training crawlers (GPTBot, ClaudeBot, Google-Extended) only affect future training. User-triggered fetchers and search bots (ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, PerplexityBot) handle live answers and citations — block these and you disappear from those products' web answers.

Why are ClaudeBot and anthropic-ai both listed?

Anthropic's current production crawler is ClaudeBot. anthropic-ai is an older identifier still seen in publisher robots.txt files and used by some downstream scrapers. Listing both costs nothing and covers older traffic.

Does robots.txt remove my content from a model that's already trained?

No. Blocking a crawler today only prevents future training snapshots from including your content. If you need removal from an already-trained model, contact the provider directly — most do not offer a formal removal mechanism, but some honor opt-out requests.

Can I block AI crawlers without giving up SEO?

Yes. AI training crawlers are distinct user-agents from search engine crawlers. Blocking GPTBot, ClaudeBot, anthropic-ai, Google-Extended and CCBot opts you out of training without touching Googlebot, Bingbot, or any other search index.