---
title: robots.txt allows AI bots
description: "GPTBot, ClaudeBot, CCBot, and Google-Extended are the named user-agents that today's largest AI ingesters use. Disallowing them in robots.txt is the explicit \"do not include this site in any LLM\" signal, and it's almost always set inadvertently when authors copy-paste a generic robots.txt template."
doc_version: 0.2.0
last_updated: "2026-05-08T23:40:54.245Z"
---

## How the check decides

The check parses your robots.txt with [`robots-parser`](https://github.com/samclarke/robots-parser) and asks each of `GPTBot`, `ClaudeBot`, `CCBot`, and `Google-Extended` whether the site root (`/`) is allowed. Passes if all four are allowed. Fails (with a list of blocked bots) if any are disallowed. If no robots.txt exists at all, the check passes, no robots.txt implies allow-all.

## How to implement it

Either omit named AI bot user-agents entirely (the global `User-agent: *` rule applies) or add explicit allow rules for them. Don't add `User-agent: GPTBot\nDisallow: /` unless you've decided you actively don't want to be in a corpus.

### Pass

```text
User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
```

### Fail

```text
User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /
```

## Sitemap

Full docs site index: [/sitemap.md](/sitemap.md).