Meta-ExternalAgent — Meta's AI Training Crawler

Meta-ExternalAgent collects training data for LLaMA and Meta AI. Learn about its robots.txt compliance, how to block it, and what still works after blocking.

QUICK FACTS

USER-AGENT Meta-ExternalAgent

OPERATOR Meta

CATEGORY AI Training

FIRST SEEN 2024-01

ROBOTS.TXT ⚠ Partial compliance reported

DOCUMENTATION Official docs →

What is Meta-ExternalAgent?

Meta-ExternalAgent is Meta's web crawler for gathering training data for their AI initiatives, including LLaMA large language models. It also supports Meta's development of independent search infrastructure. The crawler identifies itself as meta-externalagent/1.1 in HTTP headers. Some website administrators have reported inconsistent compliance with robots.txt directives.

How to Block Meta-ExternalAgent

Add the following to your robots.txt file (located at the root of your website):

User-agent: Meta-ExternalAgent
Disallow: /

What Happens When You Block Meta-ExternalAgent

Prevents Meta from using your content in LLaMA and other Meta AI training. Facebook link previews (handled by facebookexternalhit) are not affected.

Enforcement Beyond robots.txt

Meta-ExternalAgent has been reported to have inconsistent robots.txt compliance. For stronger enforcement, consider using:

Cloudflare WAF rules — Block requests matching the Meta-ExternalAgent user-agent string at the edge
Server-level blocking — Use .htaccess (Apache) or nginx rules to return 403 for the user-agent
Rate limiting — Throttle requests from the user-agent to reduce server load

Should You Block Meta-ExternalAgent?

Meta-ExternalAgent is a training crawler — it collects data to build AI models. If you want to prevent your content from being used in future AI training by Meta, block it. This is a one-way decision: blocking today only affects future crawls, not data already collected.

Meta-ExternalAgent vs Other Meta Crawlers

Meta operates multiple crawlers, each serving a different purpose:

User-agent	Purpose	Type
Meta-ExternalAgent	Collects training data for Meta's LLaMA models	AI Training
FacebookBot	Indexes content for Meta's AI features beyond link previews	AI Feature Indexing

Each crawler operates independently. Blocking Meta-ExternalAgent does not block FacebookBot — you must add a separate rule for each.

GENERATE YOUR ROBOTS.TXT

Use our visual generator to create a robots.txt file that blocks Meta-ExternalAgent and any other crawlers you want to opt out of.

OPEN GENERATOR → BLOCK ALL AI CRAWLERS →

RELATED CRAWLERS

FacebookBot

Indexes content for Meta's AI features beyond link previews

← BACK TO AI CRAWLER DIRECTORY