What is the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's training crawler. ChatGPT-User fetches pages only when a user prompt triggers web browsing in ChatGPT. Block GPTBot to opt out of training; leave ChatGPT-User allowed to remain citable in chat answers.

Does robots.txt remove my content from a model that is already trained?

No. Blocking a crawler today only prevents future training snapshots. There is no robots.txt-based mechanism to remove already-trained data.

Block AI Crawlers in Robots.txt — GPTBot, ClaudeBot, Google-Extended (2026)

Q: Does blocking Google-Extended hurt my Google Search rankings?

No. Google Search uses the Googlebot user-agent. Google-Extended is a separate identifier introduced as an opt-out for Gemini and Vertex AI training. Blocking it has no effect on Google Search.

Q: Will blocking AI crawlers stop my content from appearing in chat answers?

Only if you block the user-triggered fetchers and search bots (ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, PerplexityBot). Blocking training crawlers like GPTBot or ClaudeBot only affects future training data, not live answer fetches.

Q: Why are ClaudeBot and anthropic-ai both listed?

ClaudeBot is Anthropic's current production crawler. anthropic-ai is an older identifier still used by some downstream scrapers. Listing both covers legacy traffic at no cost.

Up-to-date list of AI training crawlers and the exact robots.txt rules to block them. Stops your content from being used to train ChatGPT, Claude, Gemini and Common Crawl.

Robots.txt is the only standardized opt-out for most AI training crawlers today. The list of crawlers below is the set most commonly named in publishers' robots.txt files; each company publishes its own user-agent identifier in its docs.

OPEN GENERATOR →

Block Every Major AI Crawler

Paste this into your robots.txt. Each crawler must be its own group — unlike Disallow rules, user-agent declarations don't merge.

# Block major AI training crawlers and search/answer bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Diffbot
Disallow: /

Stay Citable in ChatGPT and Claude

Some bots are user-triggered fetchers (a person asks ChatGPT or Claude a question, the assistant fetches your page to cite it) rather than training crawlers. If your goal is to opt out of training but stay visible in chat answers, leave the live-fetch user-agents allowed:

# Block training crawlers, but stay citable in ChatGPT and Claude answers.
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

# These are user-triggered or search bots, NOT training scrapers — leave them
# allowed so your pages can appear as cited sources in chat answers and search:
# User-agent: ChatGPT-User       (OpenAI, live fetch when a user asks ChatGPT to read a page)
# User-agent: OAI-SearchBot      (OpenAI, ChatGPT Search index)
# User-agent: Claude-User        (Anthropic, live fetch when a user asks Claude)
# User-agent: Claude-SearchBot   (Anthropic, Claude search index)
# User-agent: PerplexityBot      (Perplexity answers index)

Crawler Reference Table

User-agent	Owner	Purpose
`GPTBot`	OpenAI	Training data for GPT models
`ChatGPT-User`	OpenAI	Live fetches when ChatGPT reads a page in response to a user prompt
`OAI-SearchBot`	OpenAI	Indexing content for ChatGPT Search citations
`ClaudeBot`	Anthropic	Training data for Claude
`Claude-User`	Anthropic	Live fetches when a user asks Claude a question that requires reading a page
`Claude-SearchBot`	Anthropic	Indexing content for Claude search results
`anthropic-ai`	Anthropic	Deprecated training crawler identifier (legacy — still seen in older publisher robots.txt files)
`Google-Extended`	Google	Opt-out for Gemini and Vertex AI training (does NOT affect Google Search rankings)
`PerplexityBot`	Perplexity AI	Indexing for Perplexity answers
`CCBot`	Common Crawl	Open web dataset used as a training corpus by many downstream models
`Applebot-Extended`	Apple	Opt-out for Apple Intelligence training (separate from Applebot, which powers Siri/Spotlight)
`Bytespider`	ByteDance	Training crawler attributed to ByteDance
`Amazonbot`	Amazon	Indexing for Alexa and Amazon assistant features
`Diffbot`	Diffbot	Knowledge graph extraction frequently licensed by AI companies

What Robots.txt Doesn't Do

It's not enforcement. Robots.txt is an honor system. Reputable AI companies (OpenAI, Anthropic, Google, Apple, Perplexity) publicly commit to following it. Smaller scrapers and adversarial actors ignore it.
It doesn't undo past training. Blocking GPTBot today does not remove your content from a model that was trained on a snapshot taken before the block. Most providers offer no formal removal mechanism for prior training data.
It doesn't block Common Crawl downstream. Common Crawl makes its dataset public; many smaller models train on snapshots from years past. Blocking CCBot stops future snapshots from including your site.
It doesn't affect search rankings. Each AI training crawler is a distinct user-agent from the search engine that powers rankings. Googlebot ≠ Google-Extended; blocking the latter does not affect Google Search.

Beyond Robots.txt

HTTP header: Add X-Robots-Tag: noai, noimageai to responses for stronger signaling. The noai directive is non-standard but recognized by some crawlers.
Meta tag: <meta name="robots" content="noai, noimageai"> in the HTML head conveys the same intent at the page level.
Cloudflare AI Audit / WAF: Cloudflare offers managed rules to block known AI bots at the network edge — useful when you want enforcement, not just signaling.
Terms of Service: Update your ToS to forbid use of your content for AI training. Some publishers cite this in negotiations or DMCA disputes.

Verify Your Block Is Working

Fetch https://yourdomain.com/robots.txt with a clean cache and confirm every user-agent block appears.
Check your server logs for hits from each user-agent. After the block, well-behaved crawlers should drop off within a week.
Re-fetch in Google Search Console → Settings → robots.txt to make sure Google sees the latest version.

Block AI Crawlers FAQ

Does blocking Google-Extended hurt my Google Search rankings?

No. Google Search uses the Googlebot user-agent. Google-Extended is a separate identifier introduced specifically as an opt-out for Gemini and Vertex AI training. Blocking it has no effect on Google Search indexing or ranking.

What's the difference between GPTBot and ChatGPT-User?

GPTBot is OpenAI's training crawler — it fetches pages to build training data for future GPT models. ChatGPT-User fetches a page only when a user asks ChatGPT a question that requires browsing the web. If you want to opt out of training but still be cited as a source in ChatGPT answers, block GPTBot and leave ChatGPT-User allowed.

Will blocking AI crawlers stop my content from appearing in chat answers?

It depends which user-agents you block. Training crawlers (GPTBot, ClaudeBot, Google-Extended) only affect future training. User-triggered fetchers and search bots (ChatGPT-User, OAI-SearchBot, Claude-User, Claude-SearchBot, PerplexityBot) handle live answers and citations — block these and you disappear from those products' web answers.

Why are ClaudeBot and anthropic-ai both listed?

Anthropic's current production crawler is ClaudeBot. anthropic-ai is an older identifier still seen in publisher robots.txt files and used by some downstream scrapers. Listing both costs nothing and covers older traffic.

Does robots.txt remove my content from a model that's already trained?

No. Blocking a crawler today only prevents future training snapshots from including your content. If you need removal from an already-trained model, contact the provider directly — most do not offer a formal removal mechanism, but some honor opt-out requests.

Can I block AI crawlers without giving up SEO?

Yes. AI training crawlers are distinct user-agents from search engine crawlers. Blocking GPTBot, ClaudeBot, anthropic-ai, Google-Extended and CCBot opts you out of training without touching Googlebot, Bingbot, or any other search index.

RELATED TOOLS

SITEMAP XML OG TAGS SCHEMA MARKUP