---
title: robots.txt allows AI bots
description: "GPTBot, ClaudeBot, CCBot, and Google-Extended are the named user-agents that today's largest AI ingesters use. Disallowing them in robots.txt is the explicit \"do not include this site in any LLM\" signal — and it's almost always set inadvertently when authors copy-paste a generic robots.txt template."
doc_version: 0.2.0
last_updated: "2026-04-24T21:30:56.217Z"
---

## How the check decides

The check parses your robots.txt with [`robots-parser`](https://github.com/samclarke/robots-parser) and asks each of `GPTBot`, `ClaudeBot`, `CCBot`, and `Google-Extended` whether the site root (`/`) is allowed. Passes if all four are allowed. Fails (with a list of blocked bots) if any are disallowed. If no robots.txt exists at all, the check passes — no robots.txt implies allow-all.

## How to implement it

Either omit named AI bot user-agents entirely (the global `User-agent: *` rule applies) or add explicit allow rules for them. Don't add `User-agent: GPTBot\nDisallow: /` unless you've decided you actively don't want to be in a corpus.

### Pass

```text
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
```

### Fail

```text
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /
```

## Sitemap

Full docs site index: [/sitemap.md](/sitemap.md).
