v0.2.0 benchmark · July 1, 2026 · 5 arms, 25 runs
llms.txt saves an agent real tokens. But it won't read it unless you ask.
- Page-side links
- 0 / 20 reps used llms.txt
- One line in the prompt
- 5 / 5 reps used it
- Tokens, with the nudge
- −33% on the same page
Summary
Ship the discovery layer, then tell the agent to use it.
Publishing an llms.txt is increasingly common, though far from
universal: in our survey of the CrUX top-100k, only about 1 in 4 sites (27% of the
50,074 we reached) ship one. The harder part is getting an AI agent to actually read the file. In an earlier
a14y benchmark, a clean discovery layer (an llms.txt index plus markdown mirrors) cut an agent's
token use roughly in half at the same answer quality. In a new benchmark, the agent captured that saving only
when we told it to look. Not one of the in-page signals we tried (a <head> tag, a visible
"For agents" footer link) got it to use llms.txt on its own. A single line in the prompt did.
← All research
The payoff is real
Let's start with the good news. In our earlier discovery-layer study, we ran the same evaluation task twice against
the same site: once with the agent-discovery files served, once with them missing. Same content, same prompt,
same model. The discovery layer cut the cost about in half.
| Condition | Tokens (mean) | Tool calls | Answer quality (judge) |
| Discovery layer present | ~123K | ~10 | tied |
| Discovery layer absent | ~240K | ~20 | tied |
Roughly 2x fewer tokens, about half the tool calls, no measurable drop in quality. When an agent can pull a
tight markdown index instead of crawling and re-parsing a pile of HTML, it does less work to reach the same
answer.
But the agent won't take the payoff on its own
Five arms, identical content and identical discovery files (all returning 200). The only thing that changed
was how the page points at /llms.txt. Claude 2.1.141, isolated Docker, five reps each, adoption
read from the web server's request log.
| Arm | Signal | Fetched /llms.txt | Tokens (avg ± sd) | Tools | Page GETs | Judge | Pass |
| no-link (control) | none | 0 / 5 | 266,591 ± 32,066 | 14.6 | 59 | 75.2 ± 6.8 | 80% |
| head link | <link rel="llms-txt"> | 0 / 5 | 257,743 ± 53,953 | 13.2 | 47 | 72.4 ± 8.3 | 80% |
| head link | <link rel="alternate" type="text/markdown"> | 0 / 5 | 254,641 ± 47,968 | 14.4 | 65 | 80.4 ± 5.5 | 100% |
| visible footer link | "For agents" block | 0 / 5 | 251,741 ± 38,857 | 12.8 | 48 | 76.0 ± 3.3 | 100% |
| nudge | prompt: "check /llms.txt first" | 5 / 5 (+4/5 sitemap.md) | 177,735 ± 32,131 | 17.0 | 91 | 88.8 ± 1.6 | 100% |
We tried three genuinely different ways to point the page at /llms.txt: a machine-readable
<link rel="llms-txt"> in the head, a standards-style
<link rel="alternate" type="text/markdown">, and a plain visible "For agents" footer link a
human would click. The agent ignored all three. The four page arms cluster tightly, 252K to 267K tokens and 0
of 5 adoption, with the spread between link styles sitting well inside the noise (token sd of 32K to 54K, judge
sd of 3 to 8). The agent received the bytes (it fetched the homepage, head tags and all) and acted on none of
them.
It's not that we picked the wrong link format. Every link format failed. The lever is the instruction, not the
markup. We changed one line of the prompt and left the page exactly as it was, with no link at all. Adoption
went to five out of five (and four of five also pulled sitemap.md), and tokens dropped from
266,591 to 177,735. That is 88,856 fewer tokens, a 33% reduction on the same page, attributable purely
to the agent using a layer that was already there in every arm. Sitting next to the earlier ~2x, the two figures
bracket the opportunity: the prior study measured having the layer at all, this one measures actually using it,
isolated to a single line of prompt.
A few details from the run that sharpen the picture:
- The cost is ingestion, not generation. Output tokens were tiny across every arm, roughly 5K to 9K per rep. Almost the entire bill is input plus cache, the cost of reading the site in. That is exactly the cost the discovery layer cuts, which is why "use the index" shows up as a token drop and not a quality drop.
- More fetches, far less content. The nudge arm actually made more requests than the control (91 page GETs versus 59), yet used fewer tokens. It pulled the lean
.md mirrors (for example /scorecards/0.2.0/checks/llms-txt.exists.md) instead of full HTML pages. More round trips, a fraction of the bytes each. - Quality went up, not down. The nudge arm posted the highest judge score (88.8) and the tightest spread (± 1.6) of any arm, at 100% pass. Using the structured index made the answers cheaper and more consistent.
- Clean measurement. Zero requests hit the live a14y.dev in any arm. Adoption was read from the static server's log, filtered to the agent container's IP, so there is no contamination from background traffic. (a14y.dev's own copy documents the
llms-txt.exists check, so this was a friendly case for content-based discovery; adoption was still 0 of 5 without the nudge.)
This is not just a Claude quirk
It would be easy to write this off as one model's behavior on one site. It is not. When we went looking for
whether any major agent or crawler consumes llms.txt in the wild, the evidence pointed the same
direction.
The cleanest distinction to hold onto is publish versus consume. Almost everyone publishes.
Cursor serves an llms.txt for its own docs. Anthropic publishes one. Every site built on
Mintlify gets an
llms.txt (and an llms-full.txt) automatically, zero maintenance, which is how
a whole category of docs sites gained one at once. That is a real and healthy trend. But publishing a file is not the same as
an agent reading one, and on the consume side the record is close to empty.
- Ahrefs analyzed 137,210 domains. Of the roughly 38,000 with a valid
llms.txt, 97% received zero requests for that file in the month studied. Of the requests that did land, most were ordinary bots, not AI agents. No AI bot went looking for files that did not exist. - Google's John Mueller has said, more than once, that no AI system currently uses
llms.txt, that you can see in your server logs the bots do not even check for it, and that it is "comparable to the keywords meta tag." When llms.txt files showed up on Google's own developer properties, he said plainly it was a CMS artifact, not an endorsement. - Google's official generative-AI search guidance now states you do not need to create
llms.txt or any special machine-readable files to appear in AI search, and that Search ignores them: keeping one is fine, it just "won't harm (nor help)" rankings. - Independent server-log audits (Adobe AEM, OtterlyAI) put verified LLM fetches of
llms.txt at roughly 0.1% to 1% of traffic to those files. Most of the rest is scanners and auditors.
And the tools people assume are quietly reading it? For GitHub Copilot, Windsurf, Cline, Aider, Continue, Devin,
ChatGPT's browsing and Atlas modes, and Gemini's agent, we found no public statement either way
(as of this research, mid-2026). Silence is not adoption. We also went in expecting to find that Claude Code
would fetch llms.txt when linked from the page. The hope did not survive verification. Claude Code
can show up in server logs fetching llms.txt, but only when a user or a prompt points it there, not
on its own. We don't have any evidence that any major agent fetches llms.txt by default.
What to actually do with this
The wrong conclusion here would be "don't bother." That is not what the data says: the value of the discovery
layer is real and significant, and it lands whenever you control the agent or its instructions. Three situations
where that is true today.
Your own agents
If you build a docs assistant, a support bot, an MCP server, or an internal coding agent that reads your
site, you write the prompt. Tell it to start from /llms.txt. This data says that a single
instruction is worth roughly a third of the tokens on the spot, and up to half against a no-discovery
baseline.
Agentic tooling you can configure
A growing set of harnesses let the user or the project point the agent at a markdown index. The payoff shows
up the moment the pointer exists. Notably, Chrome's experimental
"agentic browsing" audit in Lighthouse
already checks for llms.txt, a hint that the consume side may start catching up to the publish
side.
Future-proofing at near-zero cost
Publishing the layer is cheap and mostly static. If and when agents start consuming it by default, you are
already done. If they never do, you have lost very little, and your own tooling benefited the whole time.
What the data definitely does not support is treating llms.txt as an SEO move for third-party AI
search. Today, the big crawlers mostly ignore it. Publish it because it makes the agents you point at
your content dramatically cheaper and more accurate, not because ChatGPT or Gemini will discover it on their own.
Also, llms.txt is not a standard. It is a good idea that caught on, but a better convention may well
replace it. What the data backs is narrower and more durable: a clean and lean discovery layer saves real tokens,
and the way you capture that saving today is to tell the model to use it. So bet on the two things that hold no
matter how the conventions shake out, the token savings and the instruction, rather than on llms.txt
being the permanent answer.
Why keep this in the scorecard
At a14y we score sites on agent-readability, and llms.txt is part of that score. This research does
not weaken that, it sharpens it. A high score means your site is ready to be read cheaply by an agent.
Whether a given agent takes you up on it depends on how that agent was built or prompted, which is exactly the
variable we just isolated. We would rather tell you both sides than sell you one.
If you want to see the gap for yourself, run a14y against your docs, then point your own agent at the
llms.txt we find (or flag as missing) and watch the token count. That is the experiment that
convinced us, and it takes about ten minutes. We would genuinely like to see your numbers, especially if you can
get any agent to adopt llms.txt from a page-side link alone.
npx a14y https://example.com
Caveats
Where to read these numbers with care.
- Small n, one setting. Five reps per arm (adoption is a low-frequency, near-binary event), a single retrieval prompt, a single model (Claude 2.1.141), and a single site. Treat the adoption pattern as directional, not the last word.
- One site's content. Every arm serves the same a14y.dev content; only the linking changes. And because a14y.dev documents its own discovery checks, it is a friendly case for content-based discovery. The relative ordering should generalize; the absolute numbers are tied to this content.
- A snapshot of a moving target. Agent behavior and vendor support change fast. The "no public statement" list and the "no agent fetches by default" finding are true as of mid-2026 and worth re-checking over time.
Method notes
The page-side and nudge figures are from the a14y benchmark llms-txt-linking-2026-06-27 (Claude
2.1.141, isolated Docker, n=5 per arm), with adoption read from the static server's request log filtered to the
agent container's IP. The ~2x figure is from the earlier discovery-layer case study. The agent reached our
test build via in-container HTTP, so it received the full page including the head tags and chose not to act on
them. External claims are linked inline. Full data and the reproducible harness live in the a14y benchmark repo.