Robots.txt Generator

Free robots.txt generator with visual rules, AI-crawler blocking (GPTBot, ClaudeBot, Google-Extended), sitemap support, and ready-made templates for WordPress, Shopify, Next.js and more.

  
ADVERTISEMENT

How to Use the Robots.txt Generator

  1. Pick a user-agent. * applies the rules to every crawler. Choose a specific bot (Googlebot, Bingbot, GPTBot, ClaudeBot) to target it individually.
  2. Add paths to Disallow, one per line. Start each path with a /, e.g. /admin/, /cart, /private/*.pdf.
  3. Optionally add Allow rules for sub-paths you want crawlable inside a disallowed directory.
  4. Paste your sitemap URL (e.g. https://example.com/sitemap.xml) so bots can find your pages faster.
  5. Toggle Block AI crawlers to add rules for GPTBot, ClaudeBot, Google-Extended and CCBot in one click.
  6. Copy the generated file and upload it to your website root as /robots.txt.

What Is a Robots.txt File?

A robots.txt file is a plain text file placed at the root of your domain (example.com/robots.txt) that tells web crawlers which URLs they may request from your site. It follows the Robots Exclusion Protocol (RFC 9309) and is respected by all major search engines — Google, Bing, Yandex, DuckDuckGo and most AI crawlers.

Robots.txt is the first file a crawler fetches when it hits your domain, so it is the fastest way to:

  • Keep sensitive or low-value URLs (admin panels, cart pages, search results) out of the crawl budget
  • Point crawlers at your sitemap.xml for faster discovery of new pages
  • Block AI training crawlers from scraping your content
  • Throttle aggressive bots with Crawl-delay

Robots.txt Syntax Cheat Sheet

# Apply rules to every crawler
User-agent: *

# Block a directory
Disallow: /admin/

# Block a specific file
Disallow: /private/report.pdf

# Block all URLs ending in .pdf
Disallow: /*.pdf$

# Allow a sub-path inside a disallowed directory
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

# Point bots at your sitemap
Sitemap: https://example.com/sitemap.xml

Wildcards: * matches any sequence of characters, $ anchors the pattern to the end of the URL. These are supported by Googlebot, Bingbot and most modern crawlers.

Ready-made Templates

WordPress → full WordPress guide

Block /wp-admin/ and other internal WP URLs, but keep admin-ajax.php reachable so the frontend can still call AJAX endpoints. The WordPress guide covers Yoast and Rank Math setup, WooCommerce extras, and the /wp-content/uploads/ mistake to avoid.

Shopify → full Shopify guide

Block cart, checkout, account and the sort_by filtered URLs that duplicate your collection pages. Shopify auto-generates a baseline robots.txt — see the Shopify guide for the robots.txt.liquid template that preserves Shopify's defaults while adding your rules.

Next.js → full Next.js guide

Block /api/ if your endpoints don't need indexing. Don't block /_next/ — Google needs the JS and CSS to render. The Next.js guide covers app/robots.ts, the Pages Router rewrite trick, and conditional preview blocking.

Astro → full Astro guide

Static public/robots.txt for fixed rules; src/pages/robots.txt.ts endpoint when you need build-time logic. The Astro guide shows both with i18n examples.

Block AI Crawlers → full AI crawler list

Stop GPTBot, ClaudeBot, Google-Extended and CCBot from training on your content. The AI crawler guide has the up-to-date list of 14 known training and search bots, plus the trick to stay citable in ChatGPT and Claude answers.

Allow All / Block All

Use Allow All for brand-new sites to maximize crawling. Use Block All for staging environments you never want indexed (but prefer HTTP auth or X-Robots-Tag: noindex for real access control — robots.txt is not a security boundary).

Block AI Crawlers (GPTBot, ClaudeBot, Google-Extended)

If you don't want your content used to train large language models, block these user-agents explicitly. Robots.txt is the only standardized opt-out for most AI companies today:

  • GPTBot — OpenAI's training crawler (ChatGPT)
  • ClaudeBot / anthropic-ai — Anthropic (Claude)
  • Google-Extended — Google's AI training pipeline (Gemini, Vertex AI). Note: this does not affect Googlebot or your Google Search rankings.
  • CCBot — Common Crawl (seed data for many models)
  • PerplexityBot, Bytespider, Applebot-Extended — other known AI crawlers

Click the Block AI crawlers checkbox to emit all of these rules in one shot.

Where to Upload Your Robots.txt File

The file must live at the exact path https://yourdomain.com/robots.txt. Upload it via:

  • Static sites — drop robots.txt into public/ (Astro, Next.js), static/ (Hugo, SvelteKit), or the project root (plain HTML).
  • WordPress — use an SEO plugin (Yoast, Rank Math) or upload via SFTP to the site root.
  • Cloudflare Pages / Netlify / Vercel — commit it to the repository; the build output copies it to the root automatically.
  • Shopify — edit robots.txt.liquid in the theme editor (Shopify ships a default).

After uploading, verify the file at yourdomain.com/robots.txt and test rules in Google Search Console's robots.txt tester.

Common Mistakes to Avoid

  • Blocking CSS/JS — Google needs these to render your page. Don't disallow them.
  • Using robots.txt to hide content — disallowed URLs can still appear in search results without a snippet. Use noindex meta tags or HTTP auth for real hiding.
  • Case sensitivity — paths are case-sensitive. /Admin/ and /admin/ are different rules.
  • Missing trailing slashDisallow: /blog blocks /blog and /blog-archive. Use /blog/ if you only want the directory.
  • Forgetting the sitemap line — a single Sitemap: line improves crawl discovery dramatically.

Related SEO Tools

Pair your robots.txt with a sitemap generator, a Google SERP preview for your titles and descriptions, and a schema markup generator for richer results.

Frequently Asked Questions

What is robots.txt?

A robots.txt file is a plain text file placed at the root of your domain that tells search engine crawlers which URLs they can or cannot request. It follows the Robots Exclusion Protocol (RFC 9309) and is one of the first files every crawler fetches when visiting your site.

Where should I put my robots.txt file?

Upload it to the root of your domain so it is reachable at https://yourdomain.com/robots.txt. It must be served with Content-Type: text/plain and return a 200 status code. A file at /blog/robots.txt or /subfolder/robots.txt is ignored.

Does robots.txt block pages from appearing in Google?

No. Robots.txt only blocks crawling, not indexing. A disallowed URL can still appear in Google Search if other pages link to it — it will just show without a snippet. To fully remove a page from search results, use a <meta name="robots" content="noindex"> tag or an X-Robots-Tag HTTP header and allow Google to crawl the page so it can read the directive.

Should I block AI crawlers like GPTBot and ClaudeBot?

It depends on your goals. Blocking GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended (Gemini training) and CCBot (Common Crawl) prevents those companies from using your content to train language models. It does not affect regular search engine rankings — Googlebot is a separate crawler from Google-Extended. Publishers who want to stay in ChatGPT or Claude's browsing answers should keep them allowed.

What is the difference between Disallow and noindex?

Disallow in robots.txt stops crawlers from fetching a URL at all. noindex (in a meta tag or HTTP header) lets crawlers fetch the page but tells them not to include it in search results. For removing pages from Google, noindex is almost always the correct choice.

Do wildcards work in robots.txt?

Yes for most modern crawlers. * matches any sequence of characters, and $ anchors a pattern to the end of the URL. For example, Disallow: /*.pdf$ blocks every URL ending in .pdf. Googlebot, Bingbot and Yandex all support these extensions.

Should I include my sitemap in robots.txt?

Yes. Adding Sitemap: https://yourdomain.com/sitemap.xml is the simplest way to help every search engine discover your XML sitemap without needing to submit it through each webmaster console. You can list multiple sitemaps on separate lines.

Can I use robots.txt to hide sensitive URLs?

No. Robots.txt is publicly readable, so listing a URL there actually advertises its existence to anyone who views the file. For private content, use HTTP authentication, IP allow-listing, or return a 403/404 for unauthorized requests — never rely on robots.txt for security.

How do I test my robots.txt file?

Use Google Search Console's robots.txt Tester to verify Googlebot's interpretation of your rules, and check Bing Webmaster Tools for Bingbot. Always fetch yourdomain.com/robots.txt in a browser after uploading to confirm it returns the expected content with a 200 status.