Bytespider — ByteDance's AI Crawler
Bytespider is ByteDance's aggressive web crawler. Learn how to block it in robots.txt and why you may need WAF rules for enforcement.
QUICK FACTS
Bytespider What is Bytespider?
Bytespider is a web crawler attributed to ByteDance, the parent company of TikTok. It collects web data that may be used for AI model training and search features. Bytespider has been one of the most aggressive AI crawlers in terms of request volume, and some site operators have reported it ignoring robots.txt directives.
How to Block Bytespider
Add the following to your robots.txt file (located at the root of your website):
User-agent: Bytespider Disallow: /
What Happens When You Block Bytespider
Prevents Bytespider from crawling your site for ByteDance's AI training. Consider WAF rules for enforcement since compliance may be inconsistent.
Enforcement Beyond robots.txt
Bytespider has been reported to have inconsistent robots.txt compliance. For stronger enforcement, consider using:
- Cloudflare WAF rules — Block requests matching the Bytespider user-agent string at the edge
- Server-level blocking — Use .htaccess (Apache) or nginx rules to return 403 for the user-agent
- Rate limiting — Throttle requests from the user-agent to reduce server load
Should You Block Bytespider?
Bytespider is a training crawler — it collects data to build AI models. If you want to prevent your content from being used in future AI training by ByteDance, block it. This is a one-way decision: blocking today only affects future crawls, not data already collected.
Bytespider vs Other ByteDance Crawlers
ByteDance currently operates Bytespider as a standalone crawler. Unlike companies like OpenAI and Anthropic that split functionality across multiple user-agents, ByteDance uses a single identifier for its AI crawling operations.
GENERATE YOUR ROBOTS.TXT
Use our visual generator to create a robots.txt file that blocks Bytespider and any other crawlers you want to opt out of.