robots.txt allows AI bots
GPTBot, ClaudeBot, CCBot, and Google-Extended are the named user-agents that today's largest AI ingesters use. Disallowing them in robots.txt is the explicit "do not include this site in any LLM" signal — and it's almost always set inadvertently when authors copy-paste a generic robots.txt template.
How the check decides
The check parses your robots.txt with robots-parser and asks each of GPTBot, ClaudeBot, CCBot, and Google-Extended whether the site root (/) is allowed. Passes if all four are allowed. Fails (with a list of blocked bots) if any are disallowed. If no robots.txt exists at all, the check passes — no robots.txt implies allow-all.
How to implement it
Either omit named AI bot user-agents entirely (the global User-agent: * rule applies) or add explicit allow rules for them. Don’t add User-agent: GPTBot\nDisallow: / unless you’ve decided you actively don’t want to be in a corpus.
Pass
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Fail
User-agent: *
Allow: /
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /