Bytespider — ByteDance's AI Crawler

Bytespider is ByteDance's aggressive web crawler. Learn how to block it in robots.txt and why you may need WAF rules for enforcement.

QUICK FACTS

USER-AGENT Bytespider

OPERATOR ByteDance

CATEGORY AI Training

FIRST SEEN 2022

ROBOTS.TXT ⚠ Partial compliance reported

DOCUMENTATION Official docs →

What is Bytespider?

Bytespider is a web crawler attributed to ByteDance, the parent company of TikTok. It collects web data that may be used for AI model training and search features. Bytespider has been one of the most aggressive AI crawlers in terms of request volume, and some site operators have reported it ignoring robots.txt directives.

How to Block Bytespider

Add the following to your robots.txt file (located at the root of your website):

User-agent: Bytespider
Disallow: /

What Happens When You Block Bytespider

Prevents Bytespider from crawling your site for ByteDance's AI training. Consider WAF rules for enforcement since compliance may be inconsistent.

Enforcement Beyond robots.txt

Bytespider has been reported to have inconsistent robots.txt compliance. For stronger enforcement, consider using:

Cloudflare WAF rules — Block requests matching the Bytespider user-agent string at the edge
Server-level blocking — Use .htaccess (Apache) or nginx rules to return 403 for the user-agent
Rate limiting — Throttle requests from the user-agent to reduce server load

Should You Block Bytespider?

Bytespider is a training crawler — it collects data to build AI models. If you want to prevent your content from being used in future AI training by ByteDance, block it. This is a one-way decision: blocking today only affects future crawls, not data already collected.

Bytespider vs Other ByteDance Crawlers

ByteDance currently operates Bytespider as a standalone crawler. Unlike companies like OpenAI and Anthropic that split functionality across multiple user-agents, ByteDance uses a single identifier for its AI crawling operations.

GENERATE YOUR ROBOTS.TXT

Use our visual generator to create a robots.txt file that blocks Bytespider and any other crawlers you want to opt out of.

OPEN GENERATOR → BLOCK ALL AI CRAWLERS →

← BACK TO AI CRAWLER DIRECTORY