Diffbot — Knowledge Graph Extraction Crawler
Diffbot extracts structured data for its knowledge graph, licensed by AI companies. Learn how to block it and what that means for downstream models.
QUICK FACTS
Diffbot What is Diffbot?
Diffbot builds a structured knowledge graph of the web by extracting entities, facts, and relationships from web pages. This knowledge graph is licensed to AI companies, search engines, and enterprise clients. Diffbot's data has been used as training input by multiple large AI model developers.
How to Block Diffbot
Add the following to your robots.txt file (located at the root of your website):
User-agent: Diffbot Disallow: /
What Happens When You Block Diffbot
Diffbot will not extract data from your pages for its knowledge graph. Downstream AI models that license Diffbot data will not include your content in future builds.
Should You Block Diffbot?
Diffbot builds an open dataset that multiple downstream AI companies use. Blocking it prevents future dataset snapshots from including your content, but past snapshots are already public. This is a broad opt-out that affects many downstream models at once.
Diffbot vs Other Diffbot Crawlers
Diffbot currently operates Diffbot as a standalone crawler. Unlike companies like OpenAI and Anthropic that split functionality across multiple user-agents, Diffbot uses a single identifier for its AI crawling operations.
GENERATE YOUR ROBOTS.TXT
Use our visual generator to create a robots.txt file that blocks Diffbot and any other crawlers you want to opt out of.